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Abstract 



PGA, short for ProGram Algebra [PvdZOGl IBL02) . describes sequential programs 
as finite or infinite (repeating) sequences of instructions. The semigroup C of finite 
instruction sequences [BP09a) was introduced as an equally expressive alternative to 
PGA. PGA instructions are executed from left to right; most C instructions come in a 
left-to-right as well as a right-to-left flavor. This thesis builds on C by introducing an 
alternative semigroup Cg which employs label and goto instructions instead of relative 
jump instructions as control structures. Cg can be translated to C and vice versa (and 
is thus equally expressive). It is shown that restricting the instruction sets of C and 
Cg to contain only finitely many distinct jump, goto or label instructions in either or 
both directions reduces their expressiveness. Instruction sets with an infinite number 
of these instructions in both directions (not necessarily all such instructions) do not 
suffer a loss of expressiveness. 
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Chapter 



Introduction 



Bergstra and Ponse |BP09a| introduce an algebra of finite instruction sequences by present- 
ing a semigroup C in which programs can be represented without directional bias: in terms 
of the next instruction to be executed, C has both forward and backward instructions and 
a C-expression can be interpreted starting from any instruction. 

|BP09a) provides equations for thread extraction, i.e. C's program semantics, and defines 
behavioral equivalence. It considers thread extraction compatible (anti-)homomorphisms 
and (anti-)automorphisms. Lastly, it discusses some expressiveness results. 

C is a recent alternative to PGA (PvdZ06llBL02| . short for ProGram Algebra. Contrary 
to C, PGA uses infinite instruction sequences to model infinite behavior. Since both PGA 
and C are tools that aid in the research on imperative sequential programming, and given 
that any "real world" programs are always finite, C appears to be a more realistic approach 
to a mathematical representation for sequential programs. 

This thesis introduces PGA and C and describes their semantics. It then defines an 
alternative to C called Cg which uses label and goto instructions as control structures, 
as opposed to C's relative jump instructions. Behavior preserving mappings are defined 
between PGA, C and Cg, thereby establishing that they are equally expressive. 

The final chapter of this thesis investigates the expressiveness of subsemigroups of C and 
Cg, particularly those from which a finite or infinite number of jump or goto instructions 
has been removed, thereby improving on an expressiveness result presented in |BP09a| . 

Lastly, the reader should take note of Appendix \^ which provides a graphical repre- 
sentation of some of the (single-pass) instruction sequences defined in this thesis and the 
mappings between them. 
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Chapter 



Preliminaries 



In this chapter we introduce the concepts on which the remainder of this thesis builds. In ^2.1\ 
basic thread algebra is introduced. This allows us to describe the semantics of instruction 
sequences. Next, ij2.2l and tj2.3l introduce two different takes on the way in which instruction 
sequences can be represented: on the one hand there is PGA which describes finite or 
infinite single-pass instruction sequences; on the other hand we can take the (arguably more 
natural) stance that all instruction sequences must be finite while allowing instructions to 
be executed multiple times. It is the latter theory which describes instruction sequence 
semigroups, two concrete instances of which will be introduced in the following chapters as 
C and Cg. 

2.1 Basic Thread Algebra 

Basic thread algebra, BTA for short, is a means to describe the behavior of sequential 
programs upon execution. BTA takes the position that program execution consists of a 
sequence of basic actions which are performed inside some execution environment. It is 
assumed that a fixed but arbitrary set of basic actions A is specified; this parameter is often 
kept implicit. Upon execution of an action the execution environment yields a boolean reply, 
the value of which specifies how execution should proceed. 

In this section we will briefly introduce basic thread algebra. For more on this subject 
we refer to |PvdZ061 IBP09a[ IBE02] ^ 

BTA expressions are called threads. The set of all threads is denoted BTA. For any set 
A, threads are built using two constants and a single ternary operator: 

• The deadlock constant D : BTA. 

• The termination constant S : BTA. 

• The postconditional composition operator _<_!>_: BTA x ^ x BTA — > BTA. 

It follows that each closed BTA expression performs finitely many actions and then 
terminates or becomes inactive (in the case of deadlock). 

For P G BTA and a ^ A, the thread P < a > P is often more conveniently denoted aoP. 
The action prefix operator o can be used only if the boolean reply returned after execution 
of a does not influence further behavior. Action prefix binds stronger than postconditional 
composition. Additionally, for all n > 1 we will define a" o P to mean the thread which 
performs n a-actions, followed by the behavior described by the thread P. That is, oP = 
a o P and 0"+^ o P = a o (a" o P). 

iln |BL02I BTA is called BPPA. 
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The approximation operator tt : N x BTA BTA returns the behavior of a given thread 
up to a specified "depth" ^, i.e., it bounds the number of actions performed. For all -P, Q G 
BTA and a E A we define, 

7r(0,P) = D 
7r(n + 1,S) = S 
7r(n + l,D) = D 
■K{n + l,P<a>Q) = TT{n, P)<a> Ti{n, Q) 

From now on we will write Trn{P) instead of 7r(n, P) for brevity. Since every BTA thread is 
finite, it follows that for every P € BTA there exists some n G N such that for all m G N, 

nn{P) = TTn+miP) = P 

The inclusion relation on threads in BTA is the partial ordering generated by the following 
two clauses: 

• For aU P e BTA, D C P. 

• For aU P, P' , Q, Q' e BTA and a G ^, if P C P' and Q C Q' then P <a\>Q C 
P' <a\> Q'. 

BTA has a completion BTA°° which also comprises the infinite threads. BTA°° is the 
cpo consisting of all projective sequences. We define, 

BTA°° = {(P„)„6N I Vti G N(P„ G BTA A ^„(P„+i) = P„)}. 

Now (P„)„eN = (Qn)rieN if Pn — Qn for all n G N. Furthermore we overload notation and 
define, 

D = (D,D,...), 
S-(D,S,S,...), 

'Po = D, 



{Pn)neN < a > (Qn)neN = (P„)„eN, with 



Pn+1 = Pn < a > Qr. 



This definition also shows how all elements of BTA have a counterpart in BTA°°. The 
projective sequence corresponding to a thread P G BTA is (7r„(P))„gN. 

The set RES(P) of residual threads of P has the following inductive definition: 

P G RES(P), Q < a > P G RES(P) =^ Q G RES(P) A P G RES(P). (2.1) 

Depending on the execution environment a residual thread may be "reached" by performing 
zero or more actions. 

A thread P is regular if res(P) is finite. Regular threads are also called finite state 
threads. Every element of RES(P) is a state. We write BTA™^ C BTA°° for the set of 
regular threads. 

A finite linear recursive specification over BTA°° is a set of equations 

■^i — ti 

for i G / with / a finite index set, variables Xi and all ti terms of the form S, D oi Xi<a>Xj 
with j,k G I and a € A. P G BTA™^ iff P is the solution of a finite recursive specification 
(see Theorem 1 of |BP09a) ). 



■^In this thesis we will use the convention that N is the set of all natural numbers, including 0. 
N — {0}. The integers are denoted Z. 



2.2. Program Algebra: PGA 
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2.2 Program Algebra: PGA 

A program can be viewed as a single-pass instruction sequence. That is, a program is a 
finite or infinite sequence of instructions which is executed from left to right such that every 
individual instruction is executed at most once — it is either executed or skipped. Single- 
pass instruction sequences are the main concept underlying PGA |PvdZ06l IBL02j . Given an 
(implicit) set A of actions, PGA terms are constructed by concatenating instructions from 
the set 3, defined as, 

3- |J{a,+a,-a}U |J{#fc}U{!}. 

The instructions in 3 are called primitive instructions. Let us informally define their behavior 
(note that aeA and k gN): 

a is a basic instruction. It instructs the execution environment to perform action a. The 
boolean reply returned by the environment is disregarded. 

+a is a positive test instruction. Like a, it instructs execution of action a. However, only 
if the execution environment returns true will the instruction to its immediate right 
be executed. Otherwise this instruction is skipped and execution proceeds at the next 
instruction. 

—a is a negative test instruction. This is the dual of the positive test instruction, in 
the sense that it skips the next instruction iff the environment returns true after 
performing action a. 

is a forward jump instruction. This instruction transfers execution to the fcth instruc- 
tion to its right (i.e., k — 1 instructions are skipped). Note that 7^0 instructs the 
indefinite repetition of this instruction. Hence the behavior of #0 is identified with 
deadlock. 

! is the termination instruction. It causes successful termination of the program. 

The set of PGA terms is denoted P. PGA terms are constructed from primitive instruc- 
tions using the binary concatenation operator _;_ and the unary repetition operator That 
is, P is the smallest superset of 3 that is closed under concatenation and repetition. Thus, 
for all X,Y e P, also X;Y e P and X'^ e P. Examples of PGA terms include: 

a, +b;#'3, m;a;br, -c;-c;(-a)". (2.2) 



2.2.1 First Canonical Form 

We define X^ ^ X and = X; X", for aU 

following four axioms for all X,Y, Z € P: 



ri G N. Using this notation, PGA defines the 



iX;Y);Z = X;{Y;Z) (PGAl) 

(X'Y = (PGA2) 

X^;Y = X'^ (PGA3) 

(X-^Y)"^ ^ X;{Y;X)'^ (PGA4) 

These four axioms define instruction sequence congruence. Instruction sequence congruent 
PGA expressions execute exactly the same instructions and are thus behaviorally equivalent. 
In the remainder of this thesis instruction sequence congruent PGA terms arc identified. 

(jPGAip states that concatenation is associative. Using (|PGA2p and (jPGA4p we derive 
that X" = X;X'^ for all X e P. Furthermore, using (|PGAl|) - (|PGA4p every PGA term 
can be rewritten to one of the following two forms: 
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1. X, where X does not contain the repetition operator, or 

2. X; y", with X and Y not containing the repetition operator. 

Any PGA term in one of these two forms is said to be in first canonical form. The set 
Pi C P contains exactly those PGA terms which are in first canonical form. The function 
FST: P — i> Pi converts any given PGA term to a first canonical form. Let Xi, X2,Yi,Y2 G 
P, such that Xi and X2 do not contain repetition. Then FST can be defined such that, 

fst(Y]") = fst{Yi;Y^) fst(Xi) = Xi 

fst(Yj"; Fa) = fst(Yj") FSt(Xi; AT^) = Xi; X!^ 

FST(Ai;yi";y2) =fst(Xi;Yi") 

It is not hard to see that FST is total and makes use only of (jPGAip - (jPGA4p . 



2.2.2 Second Canonical Form 

Another congruence relation defined on PGA terms is structural congruence. It is defined 
using the following four axioms which are concerned with chained jump instructions in PGA 
terms in first canonical form: 

#n+l; Mi; ... ; w„; #0 = #0; wi; . . . ; m„; #0, (PGA5) 

#n+l; wi; . . . ; u„; #m = #n+m+l; ui; . . . ; u„; #m, (PGA6) 

(#fc+n+l; ui; . . . ; Ur^r = (#^1 "li ■ • ■ ; , (PGA7) 

and, 

#n+m+k+2;ui; ... ;u„; {vi; . . . ;Wm+i)" = 

#n+k+l;ui;... ;u„;(wi;... (PGA8) 

Using (jPGAip - (jPGA8|l every PGA term in first canonical form can be rewritten to a struc- 
turally congruent PGA term without chained jump instructions (this also implies that the 
jump counter of jump instructions into and inside the repeating part of a PGA term is 
minimal). Such a term is said to be in second canonical form. As with first canonical forms, 
second canonical forms are not unique. However, any second canonical form X;Y'^ can be 
converted to an equivalent second canonical form AT'; Y"^ where X' and Y' arc minimal. 
Then X'-Y"^ IS unique. 

The set P2 C Pi contains exactly those PGA terms which are in second canonical form. 
The function SND : P — >■ P2 converts any PGA term to its minimal second canonical form. 
We do not provide an implementation here. 



2.2.3 The Semantics of PGA 



■IPGA • 



Every PGA term X ^ P has uniquely defined behavior, in the form of some thread T G 

' — > BTA'^''^ yields this thread, for every 

if X G {a, +a, —a}, 
if X = a;Y, 
if X = +a;Y, 
if X = -a;Y, 

if x = #i;r, 

if a: = ifk+2;u-Y, 
ifXG{#fc,#0;r,#fc+2;u}, 
ifXG{!,!;r}, 



BTA ^. The thread extraction operator 
PGA term. It is defined as, 

a o D 

«° I^IPGA 

|>^|pGA<«>I#2;y|pGA 

|#2;y|pGA<«^mPGA 



\x\ 



PGA 



= < 



l^lpGA 

m+l;X 

D 

S 



(2.3) 



PGA 



2.2. Program Algebra: PGA 
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Note that this definition does not exphcitly mention the repetition operator. Instead it uses 
the notion that X is "unfolded" when needed — by means of (|PGA4|) and possibly (|PGA2p . 
Thread extraction on PGA terms requires one additional rule: 



If the equations in (j2.3p can be applied infinitely often from left to right /,\ 
without ever yielding an action, then the extracted thread is D. 

Observe that (|2.4p is only relevant for PGA terms which contain an infinite sequence of 
chained jump instructions. As such it is not applicable to second canonical forms. 

Examples Let us apply the thread extraction operator |-|pcA example PGA terms 

of 1^ . 

• The behavior of the term a can be derived in a single step according to (|2.3p : 

|a|pGA = a° D. 

• +b; #3 appears to be a more complicated example, but its behavior turns out to be 
equally simple: 

1+^; #3|pGA - I#3|PGA ^ b > |#2; #3|pGA = D < 6 > D - 6 o D. 

• It turns out that the single pass instruction sequence (^^3; a; b)'^ does not perform any 
action, despite its infinite length: 

|(#3; a; br\p^^ = |(#0; a; br\p^^ = |#0; (a; 6; #0)"|pGA = D. 

Observe that the first step of this derivation applies (IPGA7[) . followed by an application 
of (|P(;A4I1 . 

• Lastly, — c; — c; (—a)" produces infinite behavior. To determine its exact behavior, we 
start out with a couple of left-to- right applications of (|2.3p : 

l-c; -c; (-a)"|pGA = l#2; -c; (-a)"lpGA < c > |-c; {~ar\pGA 

= K-aripGA ^ c > (|#2; (-aripGA < cl> \{-ar\p^^) 

= l(-«nPGA ^ c > (|#2; -a; {^arip^j, < c > \i~ar\pG^) 

= i(-«npGA ^ c > (K-aripGA < c > K-aripGA) 
-i(-«ripGA=3c>co|(-anp(,^. 



At this stage the behavior of — c; — c; (—a)" has not been fully derived, as the thread 
corresponding to (—a)" still needs to be determined. This thread turns out to be 
infinite: 

i(-«nPGA = i-«;(-«nPGA 

= |#2;(-anp(,A<«^l(-anPGA 

= |#2; -a; (-a)"|pGA ^at> K-aripQA 

-i(-anPGA^a^i(-«ripGA 

= ao |(-a)"|pQ^. 

It follows that |(— a)'^|pGA '^^^ described by the recursive specification Q ^ a o Q. 
Now, equating |— c; — c; (— a)"|pGA with P, we see that the behavior of — c; — c; (—a)" 
is equals Pi , as described by the following linear recursive specification: 

Pl=P3<C>P2, P2 = P3<C>P3, Ps^ P3^a>P3. 

(A shorter notation would he P = Q < c\> co Q, Q = ao Q.) 



12 



Chapter 2. Preliminaries 



Proposition 2.1. Each thread definable in PGA is regular, and each regular thread can be 
expressed in PGA. 

Proof. See e.g. Proposition 2 in |PvdZ06j . Alternatively, tlie result follows from the following 
two observations: 

• The code semigroup C introduced in Chapter |3] characterizes the regular threads (see 
Proposition 13. ip . 

• There exist total behavior preserving mappings from PGA to C and vice versa (see 
ti5.2l and ^5.1[ respectively). □ 

2.3 Finite Instruction Sequences and Code Semigroups 

In PGA each instruction is executed at most once and the repetition operator is used to 
construct infinite sequences of instructions. The instruction sequence semigroups introduced 
in the following chapters, on the other hand, represent only finite instruction sequences in 
which instructions can be executed multiple times in any order. This section introduces 
some relevant notions and terminology in preparation of the introduction of concrete code 
semigroups in Chapter |3] and Chapter 

2.3.1 Finite Instruction Sequences 

Consider a non-empty instruction set I and an associative binary operation _;_ on I. We 
will call _;_ the concatenation operator. Instructions can be concatenated, thereby yielding 
finite instruction sequences (inseqs) of arbitrary length. For all n € N^, let 

1^=1, T'+^ ^ {X;u\ X er\u el^}. 

Then I" is the set of instruction sequences of length n. We define 

x+ = IJ r\ 

T"*" contains all finite, non-empty (length greater than zero) sequences of I-instructions. X 
is an I-inseq iff X S X+ . An I-inseq will also be called an I-expression. We call 1 : 1+ N+ 
the length Junction, and it is defined such that (.{X) = n iff X G I". 

Concatenation is an associative operation, thus {X;Y);Z ~ X;{Y;Z) for arbitrary 
X,Y,Z G X"*". Parentheses will therefore usually be omitted, and we write X;Y;Z. Note 
also, that it trivially follows that for arbitrary n,m > 1, 

jn+m ^{X;Y\X e r\ Y e r"}. 

For convenience, we will write for the set of all I-expressions up to length n. Likewise 
contains all I-expressions of length n or greater. That is, 

I^" ^{X el+ \ £{X) < n}, Z^" = {X el+ \ £{X) > n}. 

For all i G N"*" , we define auxiliary functions <Ji : — X which return the ith instruction 
in a given I-inseq. That is, if X ~ ui; U2', . . . ; Un, then cyi{X) — ui for all 1 < i < n. We 
define i =x j iff cri(A") = crj{X). Clearly —x is an equivalence relation. 

Next, for aU X e 1+ and [/ C I we define U{X) = {i \ a^iX) e U}. In other words, 
U{X) contains the positions in the I-inseq X of instructions contained in U. 

It will sometimes prove convenient to regard an inseq X as a set whose elements are the 
distinct instructions contained m X. So for any X G X+ we write u € X to indicate that 
ai{X) = u for some i. X Ci S and X Li S are defined as one would expect them to be (note 
that S can be a set or another inseq). 



2.3. Finite Instruction Sequences and Code Semigroups 
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About Notation Let X G 2^ be an instruction sequence. Throughout this thesis we 
win write X'^ for k concatenations of X. That is, 

X^=X, X"+i=X;X". 

What about Our definition of an instruction sequence exphcitly excludes the empty 

sequence: an T-expression will always contain at least one instruction. Still, within some 
contexts it will prove convenient to talk about X'^ for any fc e N. Throughout this thesis we 
will only write X^ as part of sequences which, as a whole, are guaranteed to be non-empty, 
and are as such contained in I"*" (i.e., the set of proper instruction sequences). 

2.3.2 Code Semigroups 

Given some instruction set I, every inseq X G 1+ is constructed by concatenation of a 
finite number of elements in I. Hence X generates I^, denoted <I> = . X'^ is closed 
under the associative binary operation _;_ and as such I"*" is a semigroup with respect to 
Clearly every instruction set X gives rise to a semigroup (<I>, -;_)• We will call such a 
semigroup an instruction sequence semigroup or simply code semigroup. For an introduction 
to semigroup theory we refer to |CP61) . 

About Notation Let B refer to some code semigroup. Then we write Xb for the instruc- 
tion set of B. Xg denotes the Is-inseqs of length n, and X^ contains all Is -expressions. 
Hence we write B = {X^, _;_). When no confusion can arise, Xs-instructions and Xs-inseqs 
may simply be referred to as i?- instructions and B-inseqs (B-expressions), respectively. 
Whenever B is referred to as a set instead of a semigroup, it is identified with I^. That is, 
B stands for all well-formed B-expressions. Likewise may be referred to as a semigroup, 
in which case it is identified with B. 

Subsemigroups Let A and B be two semigroups with respect to some operator •, such 
that A Q B. Then A is a subsemigroup of B. Equivalently, if (i?, •) is a semigroup and 
A C B such that a,b d A implies a» b & A, then A is a subsemigroup of B. Note that the 
intersection of any subsemigroups of B is either empty or itself a subsemigroup of B.^ 

Given an instruction set X we can take a subset of these instructions, X' C X. Observe 
that the semigroup <X'> is a strict subsemigroup of <X>. We will define plenty of such 
subsemigroups later in this thesis. 

Semigroup Homomorphisms Consider two code semigroups A and B, and a function 
/ : X^ X^. Then / is a mapping between instruction sequences. A significant part of this 
thesis describes mappings between distinct code semigroups. Most of these mappings are 
homomorphisms . 

In general, a function / : A — > i? is a homomorphism between semigroups {A, •) and 
{B, *) iff f{x • y) = f{x) * f{y), for all x,y ^ A. It is easy to see that / only needs to be 
defined explicitly on elements of A's generating set. If <G> — A, then for all a G A — G it 
is the case that a — go • gi • ■ ■ ■ • gn+i, for some n G N and gQ,gi, . . . , gn+i G G, and hence 
fi'^) = /(.9o) * * ■ • • * /(.9n+i) by definition. In the specific case of code semigroups 

this implies that a homomorphic function only needs to be defined explicitly on individual 
instructions. 

2.3.3 Instruction Sequence Semantics 

It is the ability to be executed that sets instruction sequences apart from sequences of 
arbitrary mathematical objects. Execution of an instruction sequence leads to (possibly 



We will not consider the empty semigroup. 
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unobservable) behavior. Thus, for a sequence of objects to be called an instruction sequence, 
it must be ascribed a semantics, such that its behavior upon execution is defined. 

This thesis will use basic thread algebra to that end. This allows us to define the 
semantics of the semigroup C as in jBP09a) and provides an easy way to compare the code 
semigroups introduced in this thesis to PGA on a syntactical as well as a semantic level. 

In the tradition of PGA instructions are viewed as atomic program components: at any 
stage during the execution of a program at most one instruction is "active" (i.e., being 
executed).^ We will define the behavior of individual instructions based on their position i 
within an instruction sequence X. Execution of an individual instruction may or may not 
cause an action to be performed, after which control of execution is transferred to another 
position in X. Then, given the position of the first instruction to be executed, the semantics 
of the instruction sequence as a whole follows naturally. 

The first instruction to be executed is called the initial or start instruction. The leftmost 
and rightmost instruction of an inseq are obvious candidates to be designated as such, but 
given a specific instruction sequence X, execution can start at any position within X. Thus 
for all X 6 and 1 < i < i{X), the pair {i,X) can be identified with a certain thread, 
namely the thread which represents the behavior resulting from the execution of X starting 
with the ith instruction. Though not strictly necessary, for any invalid instruction position 
i (i.e. i < 1 or j > £{X)) the pair {i,X) will be identified with some default thread D. 
Once D has been fixed, every pair {i,X) e Z x 1+ is identified with a certain thread T. 
Throughout this thesis we will consider only one value for D, namely D, i.e. deadlock. 

In this way the thread extraction operator |_, _| : Z x 1+ — > BTA°° specifies the semantics 
of a semigroup I^. For convenience we will usually write \X\' instead of but this is 

merely a notational matter. For any X G I+, the thread describing the behavior of X if 
executed starting from the leftmost instruction is called its left behavior, written \X\^ — 
\X\^ . Likewise \X\^ = {Xf^^^ is called the right behavior of X, meaning the behavior of 
X if executed starting from the rightmost instruction. 

Once specific code semigroups have been defined — along with suitable thread extraction 
operators — it becomes possible to analyze their expressiveness. Given equally expressive 
code semigroups A and B one can define mappings between them, such that the behavior 
of any inseq X in the domain is in some way reflected by the behavior of the corresponding 
inseq Y to which it is mapped in the codomain. Similar mappings can also be defined from 
a semigroup A onto itself. 

Definition 2.2. Let A and B be two code semigroups on which the thread extraction 
operators |_,-|^: Z x 1+ ^ BTA°° and 1-,-!^: Z x J+ ^ BTA°° are defined, respectively. 
Consider arbitrary X G and Y E P and three mappings / : Xj^ , g : X\ — > P and 

h: P Then, 

• / is left behavior preserving if \X[2 = \f{X)\'^. 

• / is right behavior preserving if \X\^ = \f{X)\^. 

• / is left-right behavior preserving if it is both left and right behavior preserving. 

• / is behavior preserving if it is left or right behavior preserving. 

• / is left uniformly behavior preserving if there exists some b G such that — 

for aU i G 1. Observe that every left uniformly behavior preserving 
mapping is left behavior preserving. 

*One could draw a parallel with the program counter as found in central processing units (CPUs), which 
holds the memory address of the instruction that is currently executed (or the instruction which is to be 
executed next, depending on the architecture). 
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• / is right uniformly behavior preserving if there exists some 6 e N+ such that = 
\f{X)\% for all i € Z. Observe that every right uniformly behavior preserving mapping 
is right behavior preserving. 

• / is left-right uniformly behavior preserving if it is both left and right uniformly be- 
havior preserving. 

• / is uniformly behavior preserving if it is left or right uniformly behavior preserving. 

• g is behavior preserving if \X\'^ = \g{X)\pQj^. 

• h is behavior preserving if |^Ipqa = 

A behavior preserving mapping will also be called a translation because it preserves the 
meaning of the original (single pass) instruction sequence. 

This concludes the preliminaries. We are now ready to introduce the code semigroup C 
in the next chapter. 



more general definition would be that h is behavior preserving if there exists a function t: P Z 
such that |5^|pGA ~ \^0^)\a ' definition suffices for our purposes. 



Chapter 



C Instruction Sequences 



The previous chapter introduced PGA as a means to describe programs and BTA as a means 
to describe their behavior. It then introduced an alternative representation of program ob- 
jects, namely strictly finite instruction sequences, as opposed to PGA's infinite single-pass 
instruction sequences. Upon specifying an instruction set X the set of finite instruction 
sequences generated by concatenating elements of X forms a semigroup. This chapter intro- 
duces one such semigroup and its semantics. 

C was first described in jBP09a| . C is a code semigroup without directional bias: exe- 
cution of a C-inseq can start at the leftmost instruction (the natural choice for most people 
in Western society), but may just as well start at the rightmost instruction. In fact, given 
some instruction sequence X, any position within X can be designated as starting position. 

This chapter is built up as follows: H3. II will introduce C"s instruction set and provide 
some basic examples of C-expressions. It will also motivate the inclusion in the instruction 
set of an instruction which upon execution will cause deadlock. Next, ij3.2l formalizes the 
semantics of C-expressions using thread algebra. Based on this, N3. 31 introduces some acces- 
sibility relations on instruction positions which will be used throughout this thesis. Lastly, 
[jnm briefly discusses a small syntactic and semantic variation on C. 



3.1 The Instruction Set 

Given a set A of actions, C defines basic instructions *8, positive test instructions Cp, negative 
test instructions 91 and relative jumps 2'- 

U{/a,\a}, a= U {/#fc,\#fc}, 

*p - U {+/«' +W' ^ = U {-/«' 

^ is a parameter to C which is often kept implicit. Additionally, C has an abort instruction 
# and termination instruction !. Instructions with a backward slash are called left oriented 
or backward instructions; those with a forward slash are called right oriented or forward 
instructions. Instructions with a left (right) orientation are also said to have a left (right) 
directionality. Formally, C = (I^,_;_), with the set of all C-expressions X^ generated by 
C's instruction set Jc, defined as 

Zc-*Bu<pu«nuau {#,!}. 

Let a, fe, c G A. Then examples of C-expressions are 

/a, /a;+/a;!;\#3, -/c; -\c, \#2; -\c. (3.1) 

Each C-inseq has a semantics. Before we formalize this, it will prove convenient to informally 
describe the meaning of some of the instructions: 
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I a is a forward basic instruction. It causes execution of the action a, after which the 
instruction to its right is executed, if it exists. Otherwise deadlock occurs. Note that 
the boolean reply resulting from a's execution is ignored. 

+/a is a forward positive test instruction. Action a is executed. If its boolean reply is 
true, then the instruction immediately to its right is executed. On false, however, 
this instruction is skipped, and execution proceeds at the second instruction to its 
right. If no such instruction exists, deadlock follows. 

—/a is a forward negative test instruction, —/a mirrors the behavior of +/a, in the sense 
that the effect of the replies true and false is reversed. 



/#fe is a forward jump instruction. It causes execution of the instruction k positions to its 
right, if such instruction exists. Otherwise deadlock will follow. 



^ is the abort instruction. Execution of this instruction causes deadlock. 



! is the termination instruction. It causes the program to halt successfully. 



The instructions \a, +\a, —\a and are the backward versions of /a, +/a, —/a 
and /#fc, respectively, in the sense that they have a right-to-left instead of a left-to-right 
orientation. For example, execution of \a results in action a, after which the instruction to 
its left is executed (if such instruction exists). 

A jump instruction /#A; or has jump counter k and performs a jump of distance k 
instructions. /#fc or \^k are said to be relative jumps. The function 5: ^ ^ N+ returns 
the jump counter of a given jump instruction (e.g. (5(/#6) — 6). 

We define C Ic to be the set forward instructions. Likewise, C Xc denotes the 
set of backward instructions. Formally:^ 

- Ui/«'+/«'-/«}u u {/#fc}, 

aeA feeN+ 

= Ui\«'+\«'-WU y {\^k}. 

aeA feeN+ 

The sets = *B n and *B n X'^ denote the forward and backward oriented 

basic instructions, respectively. Likewise for *p, 'VI and Note that with the exception 
of the abort instruction # and the termination instruction !, every C-instruction has a 
direction, which is either forward or backward, but not both. That is, X^ n X'^ = and 
X^ U X^ U {^, !} — Xc. We write u ~ u if instructions u and v have the same direction 
(or no direction). ~ C Xc x Xc is the directionality relation. It is clearly an equivalence 
relation. 



Examples These informal definitions of the meaning of each instruction allow us to 
verbally describe the meaning of the example C-expressions of (13. ip . provided that we agree 
upon which instruction is the first to be executed. Since this thesis is written in English, 
which has an obvious left-to-right bias, we will designate the leftmost instruction to be the 
initial instruction. Thus we will informally describe these inseq's left behavior. 

• /a: Performs action a, after which deadlock occurs. 

• /a; +/a; !; \#3: Performs action a twice in a row. If the second action yields a positive 
reply, then the program terminates. Otherwise it starts all over. 

^Note, again, that the set A of actions is an impHcit parameter for C (and thereby for Xc, and 
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• +/b;—/c;—\c: Performs action b. If this yields the reply true, then action c will 
be performed, as specified by the second instruction. Here, a positive reply causes 
deadlock and a negative reply causes the third instruction to be executed. If the 
action b yields false then the third instruction will also be executed. The action c 
as performed by the third instruction causes execution to continue at either the first 
or second instruction, depending on whether it yields is a positive or negative reply, 
respectively.^ 

• \#2; — \c: Does not perform any action. Execution of this program immediately causes 
deadlock, since the first instruction jumps outside of the inseq. 

3.1.1 The Case for an Explicit Abort Instruction 

A draft version of the original paper on C |BP09bj provided a definition of the semigroup 
C which differs slightly from the one that was published in |BP09a| (which is introduced in 
the previous section). Let us refer to the semigroup as it was introduced in [ BP09b) by the 
name C . 

The instruction set Ic' did not contain an explicit abort instruction. It did however 
contain two other instructions which C lacks: /#0 and \#0, both of which signify a jump 
of distance zero^. That is, C" = (I^,,.;.), with 

= {/#0,\#0}Ulc-{#}. 

Since /#0 and \#0 are under all circumstances behaviorally indistinguishable, C had an 
extra axiom (aside from the obvious axiom which states that concatenation is associative) 
which stated that no distinction is made between forward and backward jumps of distance 
0: 

/#0 = \#0. 

A jump of distance is not really a jump at all and it is rather meaningless to talk about 
the direction of such a jump. Semantically both /#0 and \:^0 signify deadlock. Moreover, 
the introduction of two distinct but equivalent instructions allows for the definition of a 
mapping / on I^, such that X = Y while f{X) ^ .f{Y). 

It was therefore argued that C" should only contain jumps /#fc and \#fc for fc > 0, 
together with a single non-directional abort instruction thereby eliminating the need for 
the axiom /^O = \#0 while retaining a single instruction with essentially the same behavior 
as /#0 and \#0. This chain of reasoning naturally lead to the definition of an alternative 
semigroup, the one introduced in BP09a and the previous section under the name C.^ 

Execution of the abort instruction has the same effect as an attempt to transfer execution 
to a non-existing instruction. Since every instruction sequence is finite, one can take any 
inseq X e and construct a behaviorally equivalent inseq X' by replacing every abort 
instruction with a jump to a position < 1 or > £{X). Hence # does not increase C's 
expressiveness. Still, as we will later see, the abort instruction is a convenient addition to 
the instruction set. 



^Compare the length of this description to that of the actual program, and it becomes apparent that 
natural language is not really suited to produce concise descriptions of program behavior. There is also the 
problem of the inherent ambiguity of natural language. Luckily basic thread algebra provides a concise and 
unambiguous alternative! 

^The existence of these instructions was probably inspired by the #0 instruction as found in PGA. 
The introduction of the instruction # is not really a first. A similar instruction can be found in IBLOOI . 
where it is introduced as part of PGA. It must be noted though, that IBLOOI ascribes a different semantics 
to #, namely meaningless behavior, than to #0, which produces divergent behavior. The latter notion 
coincides with what is referred to in this thesis as deadlock (D in basic thread algebra). BTA does not 
provide a constant to represent meaningless behavior. As mentioned in a footnote in IBL02| . # was later 
dropped and should in hindsight be seen as an abbreviation for #0. 
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For completeness, we define two homomorphisnis / : Iq, 1^ and g : Iq X^, which 
make the correspondence between C and C exphcit. They are defined on individual in- 
structions u as follows: 

rt , /# if {/#oA#o}, //#o if^^ = #, 

j[u) = < , g[u) ~ < 

I u otherwise, I u otherwise. 

Now clearly f o g is the identity function on C-expressions. The axiom /=ffO = \=ffO ensures 
that likewise g o / is an identity function on C'-expressions. 



3.2 Semantics 



As discussed in i i2.3.3l C's semantics are defined using basic thread algebra. Thus any 
combination of start position i and inseq X G is assigned some thread Writing 
IX"' 



ip for jijXjp, the thread extraction operator 
i G Z and X Gl^ as, 



-ic ■ 



Z X 1+ ^ BTA™s is defined on all 



\X 



c 



\X\i^^ <a^\X\'+^ 
\X\^'<al>\X\^' 



\X 



C 



ao \X 



c 



\X[-^ <a\>\X[^^ 

\Xtc^<a>\Xf-' 

\X[c' 

D 

S 



if 


I < 1 or 


i > i{X) 


if 


CT.(A) = 


/a, 


if 


'y^{x) - 


+/a, 


if 


<y^{X) = 


-/a. 


if 


CJ^{X) = 




if 


(T^{X) = 


\a, 


if 


CT^{X) = 


+\a, 


if 


CJ^{X) = 


-\a, 


if 


CJ^{X) = 




if 


(T^{X) = 


#, 


if 


CJ^{X) = 


I. 



(3.2) 



In words, the thread iXj^ describes the behavior resulting from the execution of the inseq 
X starting at the ith instruction. Recall that we defined \X\^ 
to mean X's left and right behavior, respectively. 



\X\c and |X|^ 



\X\ 



(X) 



Examples We will apply thread extraction on the instruction sequences of p.ip to de- 
termine their left as well as right behavior. 



• The C-expression /a consists of a single instruction, and as such its left and right 
behavior are equivalent: 



l/a|c l/«lc = a° D, 



\la\c = l/alc^°^ = l/«lc = a ° D. 



• Let X — /a; -\-/a; !; \#3. The left behavior of this instruction sequence is infinite, as 
we have seen in ^3.11 This is confirmed by several applications of equations in p. 21) : 

= \X\l. ^ao\X\l, ^ ao{\X\%-^a\> |A|^) = a o (S < a ^ \X\c). 

Observe that |X|p is recursively defined by the equation P = ao(S<)a> P). As for 
the right behavior of X, we observe that |Ar|^ = |X|^''^'' = iXj^ = iXj^ = lA^jp . 



3.2. Semantics 
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• Let X = +/b: — /c; — \c. Upon trying to extract its behavior, we see that 

^\x\l^bi>\x\l 

= {\X\tj < c > \X\l,) < b > {\X\l < c > \X\l) 
= (D < c > < c > \X\l)) < b > < c > 

The behavior is clearly infinite, and no single recursive equation can describe it. The 
following linear recursive specification does; 

Pl=P2<b>P3, P2=P4^C^P3, P3=A<C^F2, ^4 = 0. 

Now = Pi and \X\^ = P3. 

• Let X = \#2; -\c. Then \X\'^ = D and \X\^ = c o D, because 

\X\^^\X\'^.^\X\~'^D 

\X\^ ^ \X\l ^ <c> \X\]j = D <c> l^l^;^ = D<c>D = coD 

Loops Without Activity C-inseqs may contain chained jump instructions which form 
a loop. The equations of p. 21) do not adequately handle this situation, as they do not assign 
any specific thread to the execution of such a loop. Hence we introduce an additional rule 
for the extraction of behavior from C instruction sequences:^ 

If the equations in p.2p can be applied infinitely often from left to right /"q q\ 

without ever yielding an action, then the extracted thread is D. 

As an example of the application of this rule, consider the C instruction sequence X = 
/#3; \#1; !; \#2; #; +\a. Its left behavior is 

\X\^ ^\X\l,^\xf^^\X\l^\X\l,^D. 

Here we derive that \X\^, \X\'^ and \X\^ equal D by means of three left-to- right applications 
of equations in p. 21) followed by application of p.3p . Indeed, the instructions at positions 
1, 2 and 4 form a closed loop without any non-jump instructions. This example is also yet 
another demonstration of the fact that the left and right behavior of an inseq are in general 
not equivalent; the right behavior oi X is, 

\X\^ = l^lc'^' \X\l, = |A|p < a > |A|p ==D<a>D = aoD. 

Proposition 3.1. Each thread definable in C is regular, and each regular thread can be 
expressed in C. 

Proof. Let X e 2^. Following p. 21) and (13.31) we have that for arbitrary i G [1,£(A)] one 
of the following is the case (for some j, k El): 

l^lc^S, l^lc^D, \X\^^ = \X\i^<al>\X\l. 

Let \i]x — {j G [1,^(A)] I 1^1^ — |A|^} be an equivalence class of positions in X from 
which identical behavior can be extracted. Let Q be the corresponding quotient set of 
[1,£(X)]. Then for all [i] G Q we define, 

fS if|^lc = S, 
^[i] = <D if|^lc = D, 
[Pm<a^ P[k] ii\X[c = \X\'c^a\>\xtc. 

^This rule is near identical to the rule II2.4II which assigns a thread to infinite sequences of chained jump 
instructions in PGA. 
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Now for all i G the thread equals P[i], which is completely specified by the 

above linear equations and is thus regular. 

Conversely, let T e BTA"^"^ be described by the linear equations Pq — ti, P2 = ^2, ■ • • , 
Pn-i = tn-i- Then there exists an X G Iq with i{X) = 3n such that Pi = and 
thus specifically l-'^l^ = Po- We construct X as follows: If = S then set a3i+i{X) = !. If 
Pi — D then set a3i+i{X) = Otherwise Pi — Pj ^ Pk, thus we set (T3i_|_i(X) — +/a. 
<^3i+2iX) and (T3i_|_3(X) are jump instructions to positions 3j + 1 and 3fc + 1, respectively. 
Positions in X for which no instruction has been specified can be assigned an arbitrary 
instruction. □ 

3.3 The Reachability of Instructions 

If the equations in p.2p are read strictly from left to right, then they define for a given inseq 
X and an arbitrary instruction at position i in X which action said instruction performs (if 
any) and at which program position(s) j execution may proceed. Let us define this relation 
between program positions as follows: 

Definition 3.2. Let X G 2j. Then the accessibility relation -^x Cl? oi X is defined as: 



That is, i -^x j ffi execution may continue at position j right after the instruction at position 
i has been executed. We then call i the source position and (Ti{X) the source instruction. 
Likewise j and iyj{X) are the target position and target instruction, respectively. 

As usual, — T'j^ denotes the transitive closure of the relation ~^x- Likewise is its 
reflexive and transitive closure. 

Definition 3.3. Let X G I^t. A program position j is reachable from position i in X if 
i — >3f j.^ The set Ti.x,i — {j \ i -^*x j} contains i and all positions reachable from i in X. 
It's complement TZx,i = Z — TZx,i naturally contains those positions which are unreachable 
from i. Note that TLx,i may include "invalid" program positions, i.e. positions outside of 



Definition 3.4. The set £x = {i & [l,i{X)] \ i -^x j,j ^ contains the exit 

positions of X. That is, execution of an instruction at some position in £x may cause a 
position outside X to be "reached" . 

Proposition 3.5. Every regular thread can be described by a C instruction sequence in 
which every instruction is reachable from the start instruction. 

Proof . Consider arbitrary T G BTA''°s, X e and i e [l,i{X)] such that \X\i. = T. If 
Ti-x,i n [1,£{X)] = 0, then T, X and i meet the requirements. Otherwise, randomly select 
some unreachable position j G Tlx,i H [l,^(Ar)]. 

If the jth instruction is removed from X , then the jump counter of any jump instruction 
which jumps over position j should be reduced by one, so as to ensure that its target 
instruction remains the same. This is possible since said jump counter must be at least 2. 
We do not have to be concerned with any other instruction which can transfer control of 
execution to or over position j; such an instruction must itself not be reachable (because 
position j isn't) and has as such no efi'ect on AT's behavior. 

The result of removing the instruction at position j from X is an inseq X' such that 
either = T 01 = T, depending on whether j < i or j > i, respectively. This 

process can be repeated until all unreachable instructions are removed. □ 

^Note that every instruction is reachable from itself. This is somewhat unconventional, but convenient 
for our purposes. 




{\X\'c, a o \X\'^, \X\'^ < a > \xf^ < a > \X\^} 

according to a single left-to- right application of an equation in p.2p . 



X. 



3.4. A Small Variation on C 
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3.4 A Small Variation on C 

For each a £ A, C provides four test instructions: +/a, —/a, +\a and —\a. Semantically 
speaking the first two of these have immediate counterparts in PGA: +a and —a. The 
latter two are backward versions of the former two, and thus are indirectly based on (or 
even inspired by) the PGA test instructions as well. 

C's lack of directional bias allows for a different semantics for test instructions, though; 
one that is instead inspired by the postconditional composition operator as found in basic 
thread algebra. Consider the following two instructions: 

+a is the positive test instruction. It performs action a. If the environment returns true 
after completion of action a the instruction to the left of the current instruction is 
executed. Otherwise the instruction to its right is executed. 

—a is the negative test instruction. This instruction mirrors the behavior of +a, in that it 
transfers control to the left or right if the action a yields false or true, respectively. 

These instructions are syntactically indistinguishable from PGA's test instructions, but 
they differ semantically. We define a code semigroup C" — [I^, , _;_) with 



C"s semantics can be formalized by altering the set of equations (j3.2p : the cases related to 
CTi(Ar) G {+/a, —/a, +\a, —\a} are no longer applicable, while two cases to handle ai{X) 6 
{+a, —a} need to be added. Thus we define for a\\ i £ Z and X £ I^, , 



\X\ 



a o \X 
a o \X 



c 



\X\'^,'<al>\X\'^,' 



\x 
\x 

D 
S 



li+k 
\i — k 



if i < 1 or i > 1{X), 
if ai{X) = /a. 



^^^(T^{X) 

if fT,(X) 
^^i<T^{X) 

if 

if c7i(X) 

if (J^{X) 
if <T^{X) 



+a, 
-a, 

#, 



3.4.1 Behavior Preserving Homomorphisms 

Now that the behavior of every C'-expression has been specified, we can answer the question 
whether C is more or less expressive than C. It turns out that these code semigroups are 
equally expressive, because we can define behavior preserving homomorphisms from C to 
C" and vice versa. 



First, we define a homomorphism / : 



Ip, on individual instructions as follows: 



/a^/a;/#4;#;#;\#4, 
\a^/#4;#;#;\#4;\a, 
/#fcK^/#5fc;#;#;#;\#4, 
\#fcK^/#4;#;#;#;\#5fc, 



+/a^/#2;/#4;+a;/#7;\#2, 
-/a^/#2;/#4;-a;/#7;\#2, 
+\a^/#2;\#2;+a;\#9;\#2, 
-\a^/#2;\#2;-a;\#9;\#2. 



!^ !;#;#;#;!, 



Every C instruction is mapped onto five C" instructions. Observe that / is left-right uni- 
formly behavior preserving. An alternative definition of / could map every C instruction 
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onto four C instructions, at the expense of being only left or right uniformly behavior 
preserving. 

The same holds for the homomorphisin g : X^,, . One can define left or right 

uniformly behavior preserving homomorphisms which map every C instruction onto three 
C instructions. Here, however, we define g such that it is left-right uniformly behavior 
preserving: 

la ^ /a; /#3; #; \#3, /#k ^ /#4fc; #; #; \#3, 

\a ^ /#3; #; \#3; \a, ^ /#3; #; #; \#4fc, 

+aK^+/a;\#2;/#2;\#3, # ^ #, 

-a^-/a;\#2;/#2;\#3, !^ !;#;#;!. 



Chapter 
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The semigroup C introduced in the previous chapter provides two ways to skip one or more 
instructions during execution: using a test instruction and using a jump instruction. In both 
cases the location of the target instruction (if present) is at a fixed distance from the source 
instruction. In other words, the distance over which control of execution is transferred is 
static and does not depend on the context (i.e., the instructions surrounding the instruction 
which is currently being executed). As a result, inserting a single instruction at an arbitrary 
position in some instruction sequence may completely alter its semantics. 

To alleviate this problem somewhat, we will introduce an alternative means to transfer 
control of execution over arbitrary distances within an instruction sequence. This chapter 
defines the semigroup Cg, a close cousin of C. Cg employs label instructions to mark 
specific positions within an instruction sequence with a natural number (a label number). 
Goto instructions can then specify such a label number as the target of a jump. 

Cg^s instruction set is introduced in ij4.1l The semantics of Cg-expressions are formal- 
ized in M.2\ This chapter then proceeds with M.31 i i4.4l and ii4.5l in which certain properties 
of label and goto instructions are analyzed and in which some useful transformations of 
Cg-expressions are defined. Combined, these sections provide us with the tools required to 
analyze Cg and its relation to C and PGA in Chapter[5j Finally, i i4.6l brieflv discusses an al- 
ternative semantics for goto instructions. After defining behavior preserving endomorphisms 
on Cg to demonstrate that this alternative semantics does not affect Cg's expressiveness, 
we will not consider it any further. 

4.1 The Instruction Set 

The semigroup Cg has basic instructions as well as positive and negative test instructions, 
just like C. Cg does not have relative jumps /#fc and \#fc, unlike C. Instead, it has a set 
of label instructions £ and a set of goto instructions 



Label instructions mark a specific location within an instruction sequence with a natural 
number I. They come in a forward as well as a backward oriented flavor, which determines 
whether the instruction to respectively the right or left of the label instruction is executed 
next. Goto instructions too are marked with a natural number I and jump to the first label 
I with the same orientation in the appropriate direction. 

^ The notation for label and goto instructions is borrowed from [BL02]. |PvdZ06| , which define a lan- 
guage PGLDg as part of the PGA language hierarchy. In PGLDg, there are label instructions £1 and goto 
instructions ##£i, for all I £ N. 
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Formally, the instruction set Icg = £U0Ulc— 5 generates the semigroup Cg = 
Note that since Cg has basic instructions and test instructions, Cg takes an 
implicit parameter A of actions, just like C. Examples of C(7-expressions include: 

+/a;#, /6;/##£0;/a;/£0;!, /&; /£3; +/a; \##£3, \£5; -\c. (4.1) 

Before formalizing Cg^s semantics, let us first informally describe what the intended behavior 
of labels and gotos is. 

/ £1 is a forward label instruction. Execution of /£l simply causes the instruction to its 
right to be executed, if it exists. Otherwise deadlock occurs. 

\£l is a backward label instruction. It is analogous to /£/, except that execution continues 
with the instruction to its left. 

/#^£Z is a forward goto instruction. Transfers control of execution to the nearest / £1 instruc- 
tion to its right, if such an instruction exists. Otherwise deadlock occurs. 

\##£Z is a backward goto instruction. This instruction will cause execution to continue at 
the nearest \£l instruction to its left. And of course, if such a label does not exist, 
deadlock will result. 

For convenience we will write £25 for the set £U ©. The function A : £© N returns the 
label number of a given label or goto instruction (e.g., A(/£6) = 6). As with C-instructions, 
we will define two sets C Icg and C Icgi which consist of forward and backward 
Cg-instructions respectively. That is, 

= (i^nicg) u U{/£^,/##£^}, 
= {i^nicg) u |J{\£/,\##£/}. 

Clearly (ll^g = and Ul^g U {#, !} = Icg- The sets £^, £^, 6^, £0^ and 
£©^ are defined as one would expect them to be. Likewise for the directionality relation 

~ C Icg X Xcg- 

Examples We will formalize Cg's semantics in t i4.2l below. Still, to create or improve 
an intuitive understanding of Cf/-expressions and how they differ from C-expressions, let us 
briefly describe the behavior of the C(7-inseqs of (j4.ip . As before, we specify that execution 
starts at the leftmost position. 

• +/a; Performs action a, after which deadlock occurs. This Cg-expression is also a 
valid C-expression. 

• /&; /^^£0; /a; /£0; !: Performs action b and then terminates. Action a is not per- 
formed, since the second instruction is a goto instruction which causes execution to 
continue at position 4. 

• /b; I £3; +/a; \##£3: Performs action b followed by action a. Then deadlock results. 
The action a is not repeated, regardless of the value returned by the execution envi- 
ronment, because the backward goto instruction will not transfer control of execution 
to the forward label instruction: their directionality does not match. 

• \£5; — \c: Deadlock. After execution of a backward label instruction the instruction 
to its left is executed. Here, no such instruction is present. 



4.2. Semantics 
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Orphaned Goto Instructions A goto instruction in some Cg-inseq X which causes 
deadlock (by lack of a "matching" label instruction) will be called orphaned. In other 
words, given some X G and i G &{X), the ith instruction of X is orphaned iff i is an 
exit position in X. 

Note that although some Cg-expression X may contain labels / £1 and \£/, this does not 
preclude the possibility that X contains a goto instruction or \##£Z which matches 

neither of these labels (and is thus orphaned). For example, in the following expression both 
goto instructions are orphaned: 

/£0;\##£0;/##£0;\£0. 

The C programming language [180991 IKR88| (not to be confused with the code semigroup 
C) allows statements within functions to be marked using labels. The statement goto Ibl; 
causes program execution to continue at the statement marked with label Ibl , provided that 
Ibl is a label within the same function. The Java programming language |GJSB05] allows 
the labeling of code blocks. The statement break Ibl; is valid only inside a block labeled 
Ibl , and indicates that program execution must be resumed after block Ibl 

This shows that C and Java, just like the semigroup Cg, restrict the scope of label 
and goto statements. The statements goto Ibl; and break Ibl; may prevent successful 
compilation of a C or Java program X, even when X contains (multiple) statements labeled 
with Ibl, because of non-overlapping scopes. 

When a C or Java compiler encounters a goto or break statement which references 
a non-existent or out-of-scope label it may'^ yield an error claiming that a certain label 
is undefined. Such an error message seems to lay the "blame" for the failure to compile 
on the non-existence of some label I, rather than on the incorrectly defined goto (break) 
statement. Using the term "orphaned" allows us to indicate that some goto instruction does 
not have a matching label instruction without blaming any specific label instruction or label 
number. 

4.2 Semantics 

As goto instructions transfer control to the nearest label instruction (if present) in the 
appropriate direction, their semantics depend on the position of said label instruction. In 
order to make this relation precise, we define two search functions, 

SEARC^iX, i,S) = min({j I j > i,aj{X) e S} U {e{X) + 1}), 
^EARCH(X,i,5) = max({j | j < i,aj{X) £ S} U {0}). 

SEARCi^ performs a forward search in a given inseq X, starting at position i, for any in- 
struction in S. The first position in X containing one such instruction is returned. If no 
instruction from S is found then the first position outside of X, (i.e., £{X) + 1) is returned. 
Search behaves nearly identical, except that it searches from right to left, and returns if 
no instruction is found. Both functions have type x Z x V{Xcg) — >■ N, where V{Icg) 
denotes the powerset of Icg- 

As with PGA and C, we will formally define the semantics of Cg-expressions using basic 
thread algebra. Let \-\~cg '■ 1-cg ^ BTA'^''^ be the function that yields the behavior of a given 
Cg-expression when executed starting with the leftmost instruction. That is, defines 
its left behavior. Likewise \-\'cg '- "^Cg ^ BTA'^''^ yields the right behavior of a given Cg- 
expression. As with C, we identify lArj^^ and l-'^lcg with jATlJ^^ and lArl^^"*, respectively, 

^It is actually not quite as simple as this, because of Java's support for exception handling. Furthermore, 
the continue keyword can also be supplied with an optional label, but only if said label precedes an iteration 
statement, not just any code block. Also note that Java (currently) does not provide a "regular" goto 
statement, although the language does identify goto as a reserved keyword. 

^Tested with gcc 4.3.3 and javac I.6.O-I4. 
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and define auxiliary functions 



■\Cg 



BTA''°s for all i e Z, such that for all X e X+ 



■Cg' 



D 



Cg 



if i < 1 or i > £{X), 
if a,{X)^/a, 



ixr+/<aE>|xr+; 



Cg 



ixrc;<«>ixi^-; 



1^ 

D 
S 



|SEARCH(X,i,{\£i}) 

Cg 



if 

if fT,(X) 
if fT,(X) 
if 
if 

if fT,(X) 
i^^^{X) 
if fT,(X) 
if fT,(X) 

if fT,(X) 
if fT,(X) 



-/a, 

\a, 

-\a, 
\£?, 

#, 



(4.2) 



As with PGA and C, we equate an infinite sequence of left-to-right derivations according to 
which does not yield an action with deadlock: 



(4.3) 



If the equations in (|4.2p can be applied infinitely often from left to right 
without ever yielding an action, then the extracted thread is D. 



This rule is specifically applicable to infinite loops created using label and goto instructions. 
For example, |/£1; \£2|pg — D, because 

|/£1;\£2|^^ = l/£i;\£2|^, = l/£i;\£2|J;,. 

This example allows for an interesting observation: label instructions can act as control 
structures even in absence of a matching goto instruction. Another example is the program 
/£5; \a, which left as well as right behavior is described by the equation P = a o P. In 
this sense Cg's label instructions are quite unlike labels in C or Java, where labels cannot 
alter the flow of control in absence of another statement which references said label (such 
as goto). 

Cg, like C, characterizes the regular threads (as stated by Proposition 13. II) . We will not 
prove that fact here; instead we refer to Proposition 16.71 in ij6.2l For completeness we end 
this chapter with the left and right behavior of the examples of i j4.1l 



l+/a;#lc<,=«°D, 
|/6;/##£0;/a;/£0;!|^^ = feoS, 
|/6;/£3;+/a;\##£3|^^ = 6oaoD, 
|\£5;-\c|^^ = D, 



l+/«;#lc<, = D, 

|/6;/##£0;/a;/£0;!|^^ = S, 
|/6;/£3;+/a;\##£3|^^ = D, 

l\£5;-\c|cg = CO D. 



4.2.1 Accessibility and Exit Positions 

The accessibility relation defined on C-inseqs X by Definition l3.2l is defined analogously 
on Cg-expressions. The same holds for the set TZx,i of instruction positions reachable from 
position i in X and its complement TZx.i (see Definition 13. 3p . The set of exit positions £x 
of a Cg-inseq X is defined as in Definition 13.41 

Note that for Cg-expressions the notion of accessibility and reachability is in a sense more 
"artificial" than for C-expressions. This is so because for any orphaned goto instruction on 
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some position i in an inseq X it is the case that either i — >x or i + 1, due to 



the definition of the functions search and Search. 

We conclude this section with a result analogous to Proposition 13.51 

Proposition 4.1. Every regular thread can be described by a Cg instruction sequence in 
which every instruction is reachable from the .start instruction. 

Proof. Consider arbitrary T e BTA'^^, X e 1+^ and i G [IA{X)] such that iXj^^ = T. If 
'Ti-x,i n = 0, then T, X and i meet the requirements. Otherwise, randomly select 

some unreachable position j G 'R-x,i n 

To see why j can be removed from X without problems, we need to make two observa- 
tions. First, any instruction which transfers control of execution to position j must itself 
be unreachable. Second, any instruction which transfers control of execution over position 
j must be a goto instruction; the behavior of such instruction will not be affected by the 
removal of the instruction at position j (for crj{X) cannot be a matching label instruction). 

The result of removing the instruction at position j from X is an inseq X' such that 
either |-^'|^^ = T or |-'^''Ic'g = T, depending on whether j < i ot j > i, respectively. This 
process can be repeated until all unreachable instructions are removed. □ 

4.3 Normalizing Label Numbers 

Cg-expressions can contain identical goto instructions which, when executed, cause a jump 
to distinct positions within the instruction sequence. Likewise, identical label instructions 
can occur multiple times within an expression. For example. 



say that the identical instructions in this expression are not semantically related. In this 
section we will make the notion of a semantical relation between label and goto instructions 
more precise. This endeavor is motivated by the observation that reasoning about a Cg- 
expression X is greatly simplified if any two label and goto instructions in X with the same 
label number and direction are known to be related in certain ways. 

Definition 4.2. Let X e I^g. If i,j G £©(X), ai{X) ~ aj{X) and \{cTi{X)) \{aj{X)), 
then the label/goto instructions at positions i and j have the same label number and direc- 
tion, and are said to correspond, written i j- 

If i G &{X), j G £'{X) and i -^x j, then the goto instruction at position i targets the 
label instruction at position j, written i r\x j- 

li i,j & (3{X), i =x j and 3k{i -^x k Aj -^x k), then the identical goto instructions at 
positions i and j are said to be target equivalent, written i Yx j- Note that target equivalent 
goto instructions can be orphaned. Also, non-target equivalent goto instructions need not 
be distinct, as in (14. 4p . 

Let i^x^ be the inverse of rxx ■ We define 



Instructions at positions i,j G £,&{X) are related iff i*xj- X is in label normal form (LNF) 
iff i Kix j implies i*x j for all i,j G £,(&{X). That is, X is in LNF if and only if any pair of 
corresponding instructions is related. 

Proposition 4.3. For all X G ^cg' equivalence relation on £,(d{X). 

Proof. Let / = {(i, i) | i g £,{X)}. -kx is reflexive since I C -kx and i Yx i for all i G &{X). 
-kx is symmetric because Yx, {r^x U r\'^^) and / are. What remains to be proved is that 
*x is transitive. To that end, let i, j and k be distinct program positions with i -kx j and 
i -kx k. We distinguish three situations: 





*x - U rvx U rx^^ U {{i, i) M G 2,{X)}. 
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• If i Yx j then either j Yx k, in which case i Yx k, ov j r\x k, in which case i rxx k. 

• If i r\x j, then j r^^^ k, and hence i Yx k. 

• If i j, then j Yx k, meaning that k r\x i and hence i k. (Note that j r\x k 
win not be the case because that would mean i = k, while we defined i and k to be 
distinct positions.) □ 

Proposition 4.4. Let X e be in label normal form. Then the following properties hold 
for all l<i,j < e{X): 

(a) Ifi e &{X), j G £{X) and i ?sx j, then i r\x j (label instructions are targeted by every 
goto instruction with the same label number and directionality). 

(b) Ifi,j G ^{X) and i =x j, then i = j (all label instructions in X are distinct). 

(c) Ifi,jG&{X) andi =x j, theni Yx j (identical goto instructions are target equivalent). 

Proof Let X G X^^ be in LNF. Note that i =x j impUes i ^x j for all i,j G £(5(X). 
Since X is in LNF, i f^x j implies i-kx j- In order, the properties follow from the following 
identities: 



*xn{<3{X)x£{X))=r^x, 

*x n (£(X) X &{X)) = I i G &{X)}, 

*xn{(&{X)x&{X))=Yx. □ 

Proposition 4.5. For any X G I^g there exists an X' G X^g such that X' is in label 
normal form and \X\Qg = l-'^'lcg /'^'^ i G Z. 

Proof. Let X G 1^^. is an equivalence relation on £,&{X). Let [i]*^ be the equivalence 
class of i and let £(S(X)/*x be the quotient set of 2&{X) by ★x- Let n = \S1,&{X)/ -*x \ 
be the number of equivalence classes. Now select a bijective mapping / from &(8{X)/-kx 
onto [1, n], and construct an inseq X' by changing the label numbers of each label and goto 
instruction in X such that X{ai{X')) = /([i]*^) for all i G &<5{X). Then X' is in LNF and 
clearly = for all ieZ. □ 



4.4 Freeing Label Numbers 



In this section we will briefly describe how certain label numbers can be removed from a 
Cg-inseq. It turns out that defining certain behavior preserving mappings on Cg instruction 
sequences is greatly simplified if one can assume that no label or goto instruction in the input 
inseq has a label number present in some set L. 

Definition 4.6. A label number I is available in a Cg-expression X if there is no u G X 
such that X{u) = I. That is, no label or goto instruction in X has label number I. To make a 

specific label number available, it must be freed. For each I G N wo define an cndomorphism 
F( which frees label number I in a given Cg-inseq. F; : 
instructions as follows: 



is defined on individual 



Fi{u) = < 



/£/'+! if u = /£l' and /' > /. 
\£/'+l if u = \£Z' and /' > /. 

if M = /##£r and /' > /. 
\##£r+l if u = \##£r and /' > /. 
u otherwise. 



(4.5) 



4.5. Cg and Relative Jumps 
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Some behavior preserving mappings require several label numbers to be available. Let 
L = {li,l2, . . . ,ln) be an arbitrary finite sequence of natural numbers. Then is the 
endomorphism which frees the label numbers in L in order. Formally, — F;^ o- • -oFj^ oFj^. 

Proposition 4.7. Let I G N and let L be an arbitrary finite sequence of natural numbers. 
Then the endomorphisms F; andF^ are left-right uniformly behavior preserving. Moreover, 
if L is monotonically nondecreasing, then for every X G ^cg' label numbers in L are 
available in Fl{X). 

Proof. Fi maps individual instructions onto individual instructions and alters only the label 
number of label and goto instructions with a label number > I. Execution of a label 
instruction causes the instruction to its left or right to be executed, depending on the 
label's orientation, but irrespective of the actual label number. It is not hard to see that 
likewise the position to which goto instructions transfer control of execution is not affected 
by the application of F;. Thus F; is left-right uniformly behavior preserving. As F^ can be 
decomposed into individual applications of functions Fj^ , . . . , F;^ , the same holds for F^. 

Since F; only increments label numbers > I, any label number < I which is available 
in some inseq X will also be available in Fi{X). It follows that if L is monotonically 
nondecreasing, then all I ^ L will be available in Fl{X). □ 

4.5 Cg and Relative Jumps 

Cg does not have explicit relative jump instructions like C. Yet in Cg, too, some instruc- 
tions transfer control of execution relative to their own position: basic instructions, test 
instruction and label instructions do so. For example, the label instruction /£6 transfers 
control to the instruction to its immediate right, equivalent to a forward relative jump over 
distance 1. 

Section ij4.6l below defines endomorphisms on Cg in order to simulate an alternative 
semantics for goto instructions. These endomorphisms map single instructions onto a fixed 
number b of different instructions. Under those circumstances care must be taken that 
instructions which perform an implicit relative jump behave properly: all relative jump 
distances are multiplied by b. 

So how does this work? In this section we will describe how relative jumps over distances 
up to some arbitrary value k can be emulated using label and goto instructions. As a first 
step, consider the following family of C(7-inseqs, defined for every / > 1 and fc > 2; 



The C(7-expressions left^ and RiGHTfc contain alternating label and goto instructions, and 
an extra label with label number 0. left^ and RIGHT^ are meant to be used as subsequences 
of larger instruction sequences. Without going into the use of \£0 and / £0 for now, observe 
that LEFTfc contains forward label instructions with label numbers 1 though fc, each followed 
by a forward goto instruction with a label number one less than the number of the preceding 
label instruction. The same holds for RiGHTfc, except that it contains backward label and 
goto instructions. 

Next, for all fc G N, consider the family of functions (j)k '■ 1.cg Icg, defined as 




/£/;/##£Z-l 
\##£/-l;\a 



RiGHTfc = Dt; . . . ; D^; Dt; /£0; 



LEFTfe = Dj;^; \£0; D2^; D3^; . . . ; D;, 



f /##£1 if u = /£/ and / < k, 
(j)u:u^\ \##£1 if u = \£l and / < k, 




u otherwise. 



The functions ipk map all label instructions with a label number not greater than k to goto 
instructions with label number 1. 
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We now combine LEFTfc, rights and (pk to create endomorphisms REL^ : 1^^ — > Ip^, for 
all k > 2, defined on individual instructions u S Icg such that, 

fLEFTfc;\##£2;\##£l;(/.fe(u);\£0;RiGHTfe ituel^g, 
[LEFTfc; /£O;0fc(u); /##£!; /##£2;RlGHTfe otherwise. 



RELfc : U H> 



The functions RELfc are not quite left or right behavior preserving. Instead, at some higher 
level they redefine the semantics of goto instructions with a label number I < k, such that 
their behavior mimics that of a relative jump over distance I. As a special case, /##£0 
and \##£0 signify a jump over distance zero and as such yield deadlock."* This alternative 
semantics can be made explicit by defining thread extraction operators l-'^lc'g k which are 
analogous to |-'4^|^„j except for the fact that the operators ^ are defined differently for 



instances where i £ {j €z 25(X) | X{aj{X)) < k}: 



\X 



D 

a o \X 



if i < 1 or i > £{X), 
if a,{X)^/a, 



Cg,k 



\x 
\x 

\x 

D 

\x 

a o 

\X 

\x 
\x 

D 

\x 
\x 

D 
S 



c+'fc<«^l^lcM^ if-^W 

if <7:{X) 
if <J^{X) 
if <J^{X) 

if <y^{X) 



i+l 
Cg,k 

SEARCll(X,i, {/£(}) 

Cg,k 

Cg]k^a^\X\cg.k ^i<y^iX) 

Cg,k<^a^\X\cg]k 
i~l 
Cg,k 



SEARCH(X,i,{\£i}) 

Cg,k 



if <J^{X) 
if <J^{X) 
if <7,(X) 

if 

if MX) 
if <J^{X) 
if <y^{X) 



+/a, 
-/a, 
/£/, 

/##£0, 

/##£/ and 1 < / < fc, 

/##£/ and / > fc, 

\a, (4.7) 

-\a, 
\£/, 

\##£0, 

\##£Z and 1 < / < fc, 
\##£/ and / > k, 

#, 



As an example, consider the Cg-inseq X = /^^£3; /£3; /a; /6 and suppose that we want 
to interpret all goto instructions with a label number < 7 as relative jumps. Then, 



\x\ 



bo \X\ 



boD. 



ICgJ \'^\CgJ 

Observe that the goto instruction on position 1 transfers control of execution to position 4; 
the label instruction with the matching label number at position 2 is bypassed. 

Fixing some fc > 2, observe that RELfc maps every Cg-instruction on 6fc = 4fc + 6 Cg- 
instructions. RELfc is defined such that the following equality holds: 



\X 



Cg,k 



I RELfc (X) 1 1;^]^-^) + ^ = |RELfc(X)| 



Cg- 



Specifically, 



\X\cg,k - mCgM = |RELfc(X)|J,^ - |RELfc(X)|^^, 



\X\cg,k = l^lSl = \^^^k{X)\''^r'''-" = |RELfc(X)|^^. 

It follows that the alternative semantics for Cg as defined by (|4.7p can be simulated using 
RELfc and Cg's default thread extraction operator. 



|f(RELfc(X)) 



■*See also ^3X11 



4.6. Label Instructions as More General Jump Targets 
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4.6 Label Instructions as More General Jump Targets 



Cg's goto instructions are defined such that they transfer control to a label instruction 
with the same label number and directionality in the appropriate direction (if present). An 
obvious alternative behavior is for goto instructions to jump to a label instruction with 
the same label number in the appropriate direction, irrespective of its directionality (again, 
provided such instruction is present). Put more informally: instead of "accepting" jumps 
from a single direction, we may alter Cg's semantics such that label instructions accept 
jumps originating from goto instructions in either direction. In this section we play with 
this idea; it turns out that with respect to expressiveness nothing is gained or lost by using 
such an alternative semantics. Therefore we will not consider this idea beyond this section. 
As a result, readers may choose to skip this section. 

This alternative semantics can be described by a thread extraction operator |_, 
which is nearly identical to the operator |_, .j^^ as defined by the set of equations (j4.2|) and 
rule (14. 3p . except for the cases involving goto instructions. Specifically (now using the usual 
shorthand notation |^|^„/ instead of \i,X\fj ,): 



D 

a o \X 



Cg' 



\X\^^',<a>\X{^^] 

I |SEARcA(X.i,{/£i,\ £;}) 
\^\cg' 

ix|-;<aE>|xr-; 

I |SEARCH(X,i,{/£i,\i;;}) 

\^\cg' 

D 

s 



if i < 1 or i > e{X), 
if a,{X) = /a, 



if CT, 

if a. 
if a. 
if a. 

if a. 
if a. 
if a. 
if a. 
if a. 

if a. 
if a. 



(X) 
{X) 

{X) 

(X) 
(X) 
(X) 
(X) 
(X) 

{X) 
(X) 



+/a, 
-/a, 
/£/, 

/##£/, 

\a, 

+\a, 

-\a, 

\£/, 

#, 



Observe that search and Search now each search for two instructions, namely /£l, 
\£l, for some I G N. 



4.6.1 Behavior Preserving Homomorphisms 



It turns out that this alternative semantics does not affect Cg's expressiveness. It is straight 
forward to define a homomorphism / such that l^lc-g = l/(^)lcg 
/ is defined on individual instructions u G Icg such that, 



for all i G Z and X G Iq^ . 



/£2l 
\£2l+l 

\##£2/- 



a u = / £1, 

if u = \£l, 
if 7. = /##£/, 
if7. = \##£/, 
otherwise. 



Indeed / ensures that any label number / is even for forward label and goto instructions, 
while / is odd for backward oriented instructions. As a result, label instructions in f{X) 
will in practice "accept" jumps from goto instructions in only one direction, rendering the 



difference between 



■\Cg 



and 



■ICg' 



irrelevant. 
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Conversely, there exists a homomorphism g such that for all z G Z there exists some 
j e Z such that l-'i^lcg' = \9{^)\cg- define g = (j) o REL2 o F(o,i,2)- The functions F(o,i,2) 
and REL2 have been defined previously by (|4.5|) and (|4.6I) . respectively. The function (p is a 
homomorphism, defined on individual C.g-instructions u such that, 



The correctness of g hinges on three observations: 

1. By Proposition 221 F^o,i.2) is behavior preserving. 

2. The homomorphism REL2 alters the semantics of goto instructions with label numbers 
< 2. These instructions are not present in its input because it is passed the output of 



3. Lastly, ip does not replace label instructions introduced by REL2. It does replace all 
other label instructions, such that the resulting subsequence of three instructions mim- 
ics the behavior of label instructions as defined by |_, -1^^/ if fed to |_, -\cg- Any label 
replaced by cj) is embedded by REL2, ensuring that the behavior of other label, basic 
and test instructions is unaffected. This explains the use of REL2: it accommodates 
for the implicit relative jumps performed by these instructions. 

We conclude with the observation that g is left-right behavior preserving, but not uni- 
formly so. This is because the number of instructions output by (j) depends on its input, g 
can be made left-right uniformly behavior preserving by using an alternative definition of </) 
which always outputs three instructions: 




u otherwise. 



/##£/; \£Z;/£/ if w = /£? with / > 2 
\n;/£l;\##£l if m = \£? with / > 2 



F(o.i,2)- As such, REL2 o F(o,i,2) is also behavior preserving. 



4>: 



/##£;; \£/;/£? 
\£/;/£Z;\##£/ 
u;/##£l;/##£2 
\##£2;\##£l;t. 



if w = /£/ with / > 2, 
\iu = \£l with / > 2, 
ifueic' Au^ /£l for I > 2, 



otherwise. 



Chapter 



Translating Instruction Sequences 



Previous chapters introduced the program algebra PGA and the code semigroups C and 
Cg. In this chapter we provide behavior preserving mappings between these algebras and 
show some properties of these translations. 

Though defined on at a syntactic level, a behavior preserving mapping / makes explicit 
certain ways in which (groups of) instructions are related on a semantic level. If /: A ^ B, 
then / tells us something about distinctions and similarities between code semigroups A and 
B. If /: A — > A, then / (if it is not the identity function), can be seen as a reformulation 
instead of a translation. Additionally, if / is an (anti-)homomorphism then it provides some 
additional implicit information about how A and B are related. Specifically, it shows that 
an A-inseq X can be translated instruction by instruction, independent of context, and 
without taking the length of X as an explicit parameter, to some -B-inseq Y. For this reason 
we aim to define homomorphic instead of arbitrary translations between code semigroups 
where possible.^ 

The translations defined in this chapter will aid us in proving some expressiveness results 
in the next chapter. In order, this chapter provides a translation from C to PGA (3131, 
from PGA to C (gO]), from C to Cg (gOJ and from Cg to C (g531)- 



5.1 Translating C to PGA 

In this section we define a behavior preserving mapping c2pga: P. We do so 

in three steps: the first two steps apply left behavior preserving mappings to C itself, 
thereby converting every C-inseq X to a behaviorally equivalent C-inseq Y which has certain 
structural properties. The third step exploits these properties in order to translate every 
such F to a behaviorally equivalent PGA term Z. The translation presented here is based 
on the behavior preserving mapping from C onto PGA as defined in section 12 of |BP09a| . 



1. PGA has basic instructions and test instructions whose semantics are identical to C's 
forward basic and test instructions. C's backward basic instructions and test instructions 
have no direct counterpart in PGA, so we wish to eliminate them. Thus we define a left 
uniformly behavior preserving endomorphism / on which removes these backward 



^Thinking of yl as a high level programming language and _B as a lower level programming language or 
even machine code, we can view / as an interpreter or compiler. If / is an (anti-)homomorphism then parts 
of an A-inscq X can bo transformed and possibly even executed before all of X has been read. 
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instructions. / is defined on individual instructions as follows: 

/a^/a;/#2;#, \a^/a;\#4;#, 
+/a ^ +/a; /#2; /#4, +\a ^ +/a; \#4; \#8, 

-/a ^ -/a; /#2; /#4, -\a ^ -/a; \#4; \#8, 



/#fc^/#3fc;#;#, #^ 
\#fc^\#3fc;#;#, !^ !;#;#. 

2. In |BP09aj the notion of C -programs is introduced. In essence, a C-program is a C-inseq 
which does not contain exit positions. I.e, no instruction transfers control of execution 
outside of the instruction sequence; only execution of the termination or abort instruction 
will cause program execution to halt. Every C-inseq X can be converted to a C-program, 
simply by prefixing and suffixing sufficiently many abort instructions. In order to main- 
tain X's left and right behavior, additional jump instructions must be added to its left 
and right. Let m > 2 be an upper bound on the largest jump counter present in some 
C-inseq X. Then a left-right behaviorally equivalent C-program X' can be constructed 
as 

/#m+l;(#r;X; (#)'"; \#m+l. 

Let g be the left-right behavior preserving mapping which performs the above procedure 
for arbitrary C-inseqs. 

3. Given / and g as defined in the previous two steps, it is immediate that for every C-inseq 
X there exists a left behaviorally equivalent C-program X' = g{f{X)) which does not 
contain instructions from the set U U D^"*". Let X' — ui; . . . ; m„. Then the 
following is a behaviorally equivalent PGA term: 

■ • ■ ; 4'n{Un))'^- 

For all n G the function is defined as follows (observe that due to application of 
g, necessarily k < n and thus n — fc G N+): 

+/a +a, \#fc ^ #n-k, 

Denoting the above procedure by h, we have that c2pga = h o g o f . 



5.2 Translating PGA to C 

Defining a translation pga2c : P — turns out to be be a lot easier if PGA terms can be 
assumed to be in second canonical form. Hence we start out by defining 

pga2c = snd2c o snd . 

Recall that snd : P — > P2 is the function defined in i j2.2.2l which converts arbitrary PGA 
terms to their structurally (and behaviorally) equivalent minimal second canonical forms. 
The mapping SNd2c : P2 — >■ is a behavior preserving mapping defined on second canon- 
ical forms only. Any X G P2 does not contain chained jump instructions and has one of 
two forms: 

• X does not contain repetition and thus X = ui;m2; . . . ; u„ for some n G N+. We 
define 

snd2c(ui; U2; ... ; M„) = i/'("i); ^'("2); • ■ • ; V'("n)- 
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• X = Y]Z'^, and Y nor Z contain repetition, meaning that for some n,m E N^, 
X = ui]. . . ; u„; (u„+i; . . . ; Un+m)" ■ Now we define 

SNd2c(ui;... ;w„;(m„+i;... ;u„+„)") = V'(wi); • ■ • ; i/'K+m); 



The function is as straightforward as can be: 

, , », f# in = o, 

1/7^? otherwise. 

SNd2c makes extensive use of the assumptions that can be made about its input (i.e., that 
it is in second canonical form). Any jump instruction Ui with i < n will not jump beyond 
Un+m- Any jump instruction with i > n will not have a jump counter greater than m — 1. 
By appending max(2, m— 1) instructions, it is ensured that all jump instructions which 
transfer control of execution beyond ip{un+m) indirectly transfer control to the appropriate 
instruction. Since Un+m-i and Un+m can be test instructions, it is important to append at 
least two backward jump instructions. 



5.3 Translating C to Cg 

In this section we focus on translations from C to Cg. It turns out that there does not 
exist a homomorphism which translates arbitrary C-expressions to behaviorally equivalent 
Cg-expressions. Theorem lS.lI b elow gives a proof of this fact. 

A convenient way to translate C to Cg is to start out by categorizing every C-expression 
based on the largest jump counter it contains. We write C<k for the subsemigroup of C 
which consists exactly of those C-expressions that do not contain instructions /#fc' or 
for k' > k. Formally, C<k — with^ 

Xc^^^Ic -{ueZ\S{u)>k}. (5.1) 

Assume the existence of a family of behavior preserving mappings c2CGfc : 1+^^ ^ I+g for 

all fc G N. Writing c2CG(fc, X) for c2CGfc(X), the behavior preserving mapping c2CG : -> 
can then be defined on all X G as,^ 

C2CG: X h-^ c2CG(max{(5(cr,(X)) | i G 3(X)},X). 

The hypothesized family of functions c2CGfc exists. A straightforward definition is (|5.3I) in 
ij5.3.1l below. An alternative homomorphic definition is (|5.5I) in ij5.3.2l Since in both cases 
c2CGfc is only defined for fc > 2, a slightly altered definition of c2CG : 1^^ is in place: 

c2cG: X ^ c2CG{u^wi{{5{a^{X)) \ i G d{X)} U {2}),X). (5.2) 

5.3.1 A Behavior Preserving Mapping from C<fc to Cg 

For all fc > 2, we define a function c2CGfc : ^c<k ^ -^Cg such that, 

c2CGfc(Mi; . . . ; Un) = -0fe,i("i); • • ■ ; V'fc, «("«)• (5.3) 



■^Recall that 5: 3 — > N+ returns the jump counter of a given jump instruction. 

^Yes, the function name C2CG is overloaded here. Its type is either N X X% — ^ Xi or simply Xi — > X^ 
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In effect c2CGfc replaces the ith instruction of its input X with the output of ipk,i{<^i{X)). 
The auxihary functions ipkji '■ ^ ^"^^ defined as follows: 







if u = 


/a, 


'/'fe,*(+/a; /##£[*+! 


fc+i;/##£[z- 


h2]fe+i) 


if u = 


+/a 


/##£[*+! 


fe+i;/##£[z- 


h2]fc+i) 


if u = 


-/a 


<^M(/a;\##£[*-i]fe- 


fi) 




if u = 


\a, 


^ <^m(+/«;\##£[*-i 


fc+i;\##£[z- 


-2]fc+i) 


if u = 


+\a 


' 0M(-/a;\##£[*-i 


fe+i;\##£[z- 


-2]fe+i) 


if 7i = 


-\a 


0fc,,(/##£[j+;]fc+i) 






if 7i = 




0fc,,(\##£[z-/]fe+i) 






if li = 










if li = 


#• 








if u = 


I. 



In this definition [njfc-i-i stands for the remainder of n after division by + i.e. the smallest 
nonnegative value congruent with n (mod k + 1). Thus < [ri]fc-|_i < k for all n. For all 
i G N+, (f>k,i- 1-cg ~^ '^%g embeds its argument between some label and goto instructions 
with label number [j]fc+i as follows: 

0fc,.(f/) = /##£Wfc+i; \£Wfc+i; /£Wfc+i; U- \##£Wfe+i. 

Informally, 4>k,iiU) "guards" the Cg instruction sequence U which replace the C instruction 
at position i in the original C-expression using the labels / £[«]/c+i and \£[i]k+i- In this way 
a goto instruction /##£[i + l]k+i or \##£[i + l]k+i in a Cg-inseq U which replaces the 
ith C instruction transfers execution to the Cg-'msec^ U' which replaces the C instruction 
at position i + I or i — /, respectively. In this way the transfer of control of execution over a 
relative distance in the original C-inseq is simulated. 

Observe that label numbers are repeated ("reused") with period A; + 1. This does not 
pose a problem because the original C<fc-expression will not contain relative jumps over a 
distance greater than k. (And since k > 2, the implicit relative jumps over distance 1 or 2 
performed by test instructions can likewise be simulated.) 

The auxiliary functions Tpk,i and their helper functions (pk^i are defined such that c2CGfc 
is left-right behavior preserving. Note that it is possible to omit the rightmost \##£[«]fc+i 
instruction outputted by each call to (j)k,i^ but then c2CGfc would no longer be right behavior 
preserving. 



5.3.2 What About a Homomorphic Translation from C to Cg? 

The translation c2CG : -> 1^^ defined by (|5.2I) is not homomorphic because it requires 
knowledge about the largest jump counter present in its input. It turns out that it is not 
possible to define a homomorphic alternative to C2CG. 

Theorem 5.1. There does not exist a behavior preserving homomorphism f: -^1^^. 

Proof. We prove that no homomorphism / : — can be left behavior preserving. The 
proof that no such / can be right behavior preserving is analogous. 
For all n G N+ we define the following C-inseqs: 

NODE„ = +/a; /#3n-l; /#3n+l, 

(5.4) 

TREE„ — NODEi; NODE2; . . . ; NODE2"_i. 

Observe that tree„ contains 2" exit positions (see Definition 13. 4[) . each containing one of 
the rightmost 2" forward jump instructions of tree„. Exactly one of these exit positions 
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1: 


NODEi ^ 






; TREE3: 


4: 


NODE2 - 






---> 7: 


NODE3 - 




10: NODE4 


---> 13: 


NODE5 


---> 16: 


NODEe 


> 19: NODE7 : 








• 


• 


• 


• 


• • 



Figure 5.1: Graphical representation of the C-expression TREE3; X as defined in (|5.4p . The 
dashed arrows show the order in which the subexpressions nodEi, . . . ,nodE7 are concate- 
nated (the prefixes denote their positions in TREE3). The sohd arrows signify jumps between 
which a choice is made based on the boolean reply to the a-test in the originating node. As 
depicted here, all instructions at exit positions of TREE3 jump to distinct positions within 
the inseq X. This means that £{X) > 22. 



will be reached after n consecutive a-tests, provided that execution starts at position 1. 
Every instruction in tree„ is reachable from position 1. Figure [01 provides a graphical 
representation of TREE3 . 

Towards a contradiction we will now assume that there does exist a left behavior pre- 
serving homomorphism / from the code semigroup C onto the code semigroup Cg. 

It is easy to see that for any combination of to < 2" exit positions zi, Z2, . . . , im in tree„ 
there exist some X € 1^ such that all of the following yield distinct behavior:'' 

|tree„;X|^, |TREE„;X|p, . . . , |tree„;X|^. 

It follows that /(tree„) must have at least 2" — 2 distinct orphaned forward goto instruc- 
tions, all of which are reachable from the leftmost instruction.^ 

For all X G ^cg^ — X Ci £^ be the set of distinct forward label instructions in 

X. Obviously Lx = Lx^ for all fc e N+. 

Now take some n, fc G N such that 2" - 2 > \Lf^/a)\ and k > 3(2" - 1) + 1. Then 

|tree„; (/ a)'^ 1^ will perform at least n + l consecutive a-actions, irrespective of the boolean 
replies they yield. However, this cannot be the case for |/(tree„; (/a)'^)|^^. Some of the 
forward goto instructions in /(tree„) which are reachable after n a-tests cannot have a 
matching label instruction in /((/a)'^), because the number of distinct forward label in- 
structions = |L(j(/a))fc| — \Lfya) \ is smaller than the number of distinct forward 
goto instructions (which is at least 2" — 2). Thus we reach a contradiction. □ 

A Behavior Preserving Homomorphism from C<fc to Cg It turns out that the 
result of Theorem 15. II is due to a surprisingly small lack of information about the context of 
individual instructions. Once an upper bound on the size of jump counters in the input inseq 
is known, a homomorphism can be defined. In other words, there does exists a homomorphic 
alternative to the family of behavior preserving mappings c2CGfe : Ic<k ^ '^Cg defined by 
(|5.3p in ij5.3.1l We provide one such alternative definition, by building on the work of 
For all fc > 2 we define, 

C2CGfc ^ RELfc o (/). (5.5) 
''In fact, infinitely many inseqs X fiavc this property. 

^We do not exclude the possibility that either or both of the rightmost two instruction positions of 
/(tree„) are exit positions containing forward basic instructions, test instructions or label instructions. 
This explains the conservative estimate of 2" — 2 instead of 2" orphaned forward goto instructions. 



40 



Chapter 5. Translating Instruction Sequences 



The homomorphisni RELk : I^g ~^ T^g is defined by (|4.6p in ij4.5l RecaU that it causes ah 
goto instructions with label numbers up to and including k to behave as relative jumps. It 
should come as no surprise then that the definition of the homomorphism : — > is 
straightforward: 

r/##£fc liu^/^k, 

I u otherwise. 

Observe that c2CGfe is left-right uniformly behavior preserving. (Like RELfc, c2CGfc maps 
every instruction in the input instruction sequence to 4fc + 6 instructions in the output.) 



5.4 Translating Cg to C 

Defining a behavior preserving mapping CG2c : — >■ is rather straightforward. Label 
instructions can simply be replaced by relative jumps over distance 1 in the appropriate 
direction. Goto instructions are replaced by relative jumps to the position of the label 
instruction which they target, if any. Orphaned goto instructions can be replaced by an 
abort instruction or a jump outside of the instruction sequence. For convenience we will 
choose to do the latter. 

For all i £ we define functions (j)i : — > Xc such that, 





if <yr{X) = 


/##£Z and j 


= search(X, i, {/£/}), 




if <y^{X) = 


\##£Z and J 


= ^earch(X, i, {\£l}), 


/#1 


if a,{X)^ 


/£/, 




\#1 


if a,iX)^ 


\£/, 






otherwise. 







(5.6) 



(f>i determines whether and how the ith instruction in a given Cg-inseq X should be trans- 
lated. Only label and goto instructions are replaced, precisely according to the rules 
mentioned. Concatenating the results of appropriate invocations of (|5.6p . the mapping 
CG2c : X^g X^ is thus defined: 

cg2c: X ^ Mxy,MX); ■■■ ;(t>iix){X). (5.7) 

Every label and goto instruction is replaced by a jump instruction which mimics its transfer 
of control of execution. Other instructions are unaltered. Thus CG2c is left-right uniformly 
behavior preserving. 

5.4.1 What About a Homomorphic Translation from Cg to C? 

The translation CG2c defined by (j5.7D is not a homomorphisni. It turns out that this is 
necessarily so. 

Theorem 5.2. There does not exist a behavior preserving homomorphism /: X^^ X^ . 

Proof. We prove that no homomorphism / : X^^ X^ can be left behavior preserving. The 
proof that no such / can be right behavior preserving is analogous. 
For all n G we define the following C.g-inseqs: 

NODE„ /£n; +/a; /##£2n; /##£2n-(-l, 

TREE„ = NODEi; NODE2; . . . ; NODE2"_i. 
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It is not hard to see that tree„ contains 2" orphaned goto instructions with label numbers 
2" through 2"+^ — 1. For example, TREE2 contains the orphaned goto instructions /##£4, 
/##£5, /##£6 and /##£7: 

/£l;+/a;/##£2;/##£3; 

/£2;+/a;/##£4;/##£5; 
/£3;+/a;/##£6;/##£7. 

If execution of tree„ starts at position 1, then exactly one of the orphaned goto instructions 
will be reached after performing n consecutive a- actions. Every orphaned goto instruction 
is reachable. 

Towards a contradiction we will now assume that there does exist a left behavior pre- 
serving homomorphism / from the code semigroup Cg onto the code semigroup C. 

For aU X G define Rx = {j - iiX) \ i e [1, i j, j > i(X)\. Informally, Rx 

contains the offsets of "invalid" positions to the right of X which are reachable from X. We 
fix some r such that max(i?j^(/£()) < £{f{{/aY)). Then Rf(/a-(/a)'-) = ^f{(/ay)i ^-^d in fact 

^fii/a)") ^ ^/((/a)") k>r. 

Next we define TREE„_fc = tree„; (/a)'^ for all n, fc G N, and we make two easily verifiable 
claims: 

(1) For all n e N, k,k' > 2 and X G I^g the identity |TREE„^fc; = [TREE„^fc/ ; 
holds. To see why this is so, observe that all exit positions in TREE„^fc and TREE„^fe/ 
are goto instructions and that TREE„^fc and TREE„^fc/ do not contain backward label 
instructions. As a result only the last two instructions of tree„^/. and TREE„_fc' (which 
are /a instructions) may be reachable from a position in the "X-part" of TREE„^fe;X 
and TREE„_fc'; X. 

(2) For any combination of m < 2" distinct positions of orphaned goto instructions ii, 
12, . . . ,im within TREE„^fc there exists an X e 1^^ such that all of the following yield 
distinct behavior:^ 

|TREE„,fc;X|^^, |TREE„,fc;X|J^g, . . . , |TREE„^fe; Xfj^^. 

Combining these two claims, we must conclude that \Rf (tree„ k) \ — ^'^^ n, fc e N. 
Now take some n such that 2" > \Rf(^(^/ay)\ and select some k > r such that -R/(tree„ ^) = 
-R/(TREE„;(/a)'=) = ^fii/a)'') = Rf[(JaY)- But thcu |i?/(TREE„.fc) I ~ I < 2"- Contra- 

diction. □ 



A Behavior Preserving Homomorphism from Cg<k to C Similar to the definition 
of subsemigroups C<fe, we define subsemigroups Cg<fc C Cg for all fc £ N. Cg<k contains 
precisely those Cij-inseqs which do not contain goto instructions with a label number greater 
than k. That is, we define Cg<n — {^cg<_ '-'-)' "with 

Xcg<, = Icg - {U^®\\{u)>k}. 

Note that Cg<k places no restriction on label instructions. As such, the utility of label 
instructions with a label number greater than A; in a Cg<fc-expression is limited. 

As per Theorem 15.21 no total homomorphism from Cg to C can be behavior preserving. 
However, the family of behavior preserving functions CG2Cfc : T^^^^ — !> (k G N) can be 
defined such that each CG2Cfc is a homomorphism. Given arbitrary fc, we define CG2Cfc on 
individual instructions as follows: 

CG2Cfc(u) = (l>k(u); NEXTfc(u); LEFTfc('u); RIGHTfc(u). (5.8) 
^Note again that there are in fact infinitely many such X. 
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Here (j)k is defined as: 

/a^/a, \a^/a, /£l^/#l, /#fc+/+4, 

+/a^+/a, +\a^+/a, \£? /#1, \4^#gl ^ / #1+2,, 

-ja^-ja, -\ai-^-/a, \^\. 

Furthermore, NEXTfe, LEFTfe and RiGHTfe are defined as follows: 



NEXTfe : U 



LEFTfe : U !->• 



RIGHTfe : U ^ 



f\#2fc+6;\#4fc+12 ifweX^-^, 
|/#2/c+4;/#4A;+8 otherwise, 

r(\#2fc+5)'; \#/+3; {\#2k+hf-^ if w = \£/ and I < k, 
|(\#2A:+5)'=+i otherwise, 

r(/#2fc+5)'; \#k+l+4; (/#2fc+5)*^-' if u = / £1 and / < k, 
|(/#2A;+5)'=+i otherwise. 



The mapping CG2Cfe : X ^ Y can be explained using the metaphor of a "highway" that is 
laid between successive instructions of X. The highway contains a dedicated lane for each 
goto instruction /##£l and \##£Z for < Z < fc, thus resulting in a highway with 2fc + 2 
lanes. The highway is the result of the functions LEFTfe and RIGHTfe. Each C5f<fe-instruction 
is mapped onto 2fc + 5 C-instructions: 

label/goto "highway" with 2fc + 2 "lanes" 



k + 1 "lanes" to the left fe + 1 "lanes" to the right 



u; v; w; \#2fc+5; . . . ; \#2fc+5; /#2fc+5; . . . ; /#2fc+5 

^ V ' 

these 2fc + 5 C instructions represent a single Cg<k instruction 

The highway is used solely to mimic the behavior of goto instructions using a finite number 
of jumps. The following C-inseq is yielded by CG2Cfe (/##£/): 



■ entering the "highway" 
k + I + 3 instructions 




/#fc+Z+4; /#2fc+4; /#4fc+8; (\#2fc+5)'=+^ (/#2fc+5)'; /#2fc+5; (/#2fc+5)'= 



LEFTfe (/##£0 RIGHTfe (/##«) 

The intention here is that the effect of /#i^£l is to jump onto the Ztli highway lane to the 
right. This lane consists of chained jumps, each of distance 2k + 5, until the segment of 
C-instructions that is the result of CG2Cfe(/£l) (note that I < k, for otherwise /=ff#£l would 
not be part of the input). There, a jump instruction off the highway can be found: 



■ leaving the "highway" 




+ / + 3 instructions 
t , ^ 

/#1; /#2fe+4; /#4A:+8 (\#2fc+5)'=+i; (/#2fc+5)V\#fc+^+4; (/#2A;+5)'=-' 

^ V ' ^ V 

LEFTfe(/£;) RIGHTfe(/£;) 




2fc + 3 instructions 

CG2Cfe maps each C£(<fe-instruction in an inseq X onto 2k + 5 C-instructions in an inseq 
Y. Thus the C-instructions corresponding to the ith instruction in X start in Y at position 

(z-l)-(2fc + 5) + l. 

It follows that \X\i^.g = |cG2Cfe(X)|[;"^^^^''+^^+^ for alH G Z, /c < 2 and X e I^g. Thus 
CG2Cfe is left uniformly behavior preserving. 
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As stated in N3.1.11 the abort instruction does not enhance C"s expressiveness as any abort 
instruction can be replaced by a jump instruction with a sufficiently large jump counter. In 
ii5. li the first of three steps involving the translation of C to PGA involved the elimination of 
backward basic/test instructions. These observations naturally lead one to wonder whether 
C contains more redundant instructions. There are at least two ways to prove that this is 
indeed the case, both of which will be utilized in this chapter. 

• On the one hand one can define a procedure M which, given an arbitrary regular thread 
T 6 BTA"*5, constructs a C-expression X such that = T for some i G [l,l{X)], 
using only a subset of all C instructions, regardless of T. Clearly, any instruction 
which is not utilized by M irrespective of its input is redundant in the sense that it 
does not enhance C"s expressiveness. 

• On the other hand one can define a function / on which translates any given inseq 
X to a behaviorally equivalent inseq Y , such that certain instructions will never be 
present in Y . Again, any such instruction can be deemed redundant from the point of 
view of expressiveness. 

In our quest to trim C"s instruction set we will inevitably stumble upon instruction sets 
which cannot express all threads in BTA'^''^. As we will later see, there is in fact a hierarchy 
of expressive power. 

Each C or Cg instruction u has a dual u: for forward instructions this is their backward 
counterpart, and vice versa. The abort and termination instructions are their own dual. 
Thus e.g. /a = \a, —\b = —/b and # = #. Observe that the dual operator is an involution: 
u = u for all u G 1(7 U Icg ■ 

The anti-automorphism rev reverses a given instruction sequence and converts all its 
instructions to their dual. It is defined on C and well as Cg instruction sequences. For 
example, 

REv(+/a; !; \#2) = \#2; I; T7^ = /#2; !; +\a. 
Observe that rev is an involution, because for all i G 



a^{X) = Cr^(x)-i+l(REV(X)) =CTf(x)-(f(X)-j+l) + l(REVoREV(X)) = cr^{X). 

It is not hard to see that = |rev(X)|^ for arbitrary inseq X. It follows that any code 

semigroup generated by some set / C Iq or / C I^g is exactly as expressive as the set of its 
duals {u I M G /}. Thus REV tells us something about the expressiveness of subsemigroups 
of C and Cg. 

The remainder of this chapter is organized as follows: in ijG.ll we will be concerned with 
the expressiveness of several subsemigroups of C. Specifically, we will show that a reduction 
of Tc so that it contains only a finite number of forward or backward jump instructions (or 
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both) reduces its expressiveness. In t j6.2l we will combine the results of iJ6.1l with some of the 
translations defined in the previous chapter and use these to make some statements about 
the expressiveness of Cg and some of its subsemigroups. 

6.1 The Expressiveness of Subsemigroups of C 

In ij5.ll it was shown that backward basic instructions and backward test instructions do 
not increase C"s expressiveness, by means of a left behavior preserving endomorphism / on 
I(t which does not output any of these instructions. In other words, the code semigroup 
generated by the instruction set Ic — — — is as expressive as C itself. This 
instruction set is not minimal, however, since the proper subset U J U {!} suffices. This 
is demonstrated by the left behavior preserving endomorphism g, defined on individual 
C-instructions by 



The next question which naturally arises is whether the instruction set U 3 U {!} is 
minimal. For example, can we do with less than infinitely many jump instructions? And if 
not, will an infinite but otherwise arbitrary set of jump instructions suffice? We will now 
investigate those questions. 

Recall the definition of the subsemigroup C<k in ij5.3l As defined by (|5.ip . C<fc's in- 
struction set does not contain jump instructions with a jump counter greater than k. 

Theorem 6.1 (Bergstra & Ponse). Let \A\ > 2. There does not exists a value k e such 
that C<k can express all finite threads. 

See the proof of Theorem 7 in [BP09a| ; it has been replicated in Appendix |BJ See the 
proof of Theorem 16.21 below for a discussion. 

Theorem 6.2. Let A be non-empty. There does not exists a value k G such that C<k 
can express all finite threads. 

Proof. By Theorem 16.11 we conclude that if |^| > 2, then C<k cannot express all finite 
threads. What remains is to be proved is that claim also holds if = 1. We do this by 
"patching" the proof by Bergstra & Ponse. As their proof is rather long we will not repeat 
it here — instead we summarize some key aspects of the proof, point out why it requires that 
1^1 > 2 and then proceed to show how this requirement can be eliminated. (Again, the 
proof is provided verbatim in Appendix IbI) 
The proof uses two key notions: 

• Following the definition of residual threads by (j2.ip . the concept of n-residual threads 
is defined: Q is a 0-residual thread of P if P = Q. Q is an (n+l)-residual thread of P 
if P = Pi <! a t> P2 and Q is n-residual of either Pi or P2 . 

• Now a thread P has the a-n-property if 7r„(P) = a" o D and P has 2" — 1 distinct 
n- residuals with a first approximation not equal to a o D.^ An instruction sequence 
has the a-n-property if a thread with the a-n-property can be extracted from it. 

^The sentences following this definition of the a-n-property in BP09a make it clear that P is meant to 
have 2" instead of 2" — 1 distinct n-residuals with a first approximation not equal to a o D. It turns out 
that this slightly weaker definition of the property does not affect the proof in any significant way. 



/aK^+/a;/#2;/#l, 
+/aK^+/a;/#2;/#4, 
-/aK^+/a;/#5;/#l, 



\aH^+/a;\#4;\#5, 
-f \a ^ +/a; \#4; \#8, 
-\a^+/a;\#7;\#5. 



/#fc /#3fc; !; ! 
\#fc \#3fc; !; ! 



#^/#l;\#l;! 
!^ !;!;!. 
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Figure 6.1: Graphical representation of a thread T with the a+5-property. The "leaves" 
Rn in this tree represent pairwise distinct 5-residuals of T which are each also distinct from 
any m-residual of T for to < 5. This in turns means that all m-residuals for m < 5 are 
pairwise distinct. For if e.g. Pi and P2 are not distinct, then Qi and Q2 are identical as 
well, violating T's a+5-property. A similar argument holds for any pair of TO-residuals with 
TO < 5. 



The proof by Bergstra and Ponse shows that for every A; G N there exists an n e N"*" 
such that no C<fe-expression X has the a-n-property. It does so by assuming the contrary 
and taking the minimal value for k in this respect. It is then shown that, given arbitrary 
n € N"*", one can find an X G ^c<k ^^^^ a-n-property for which it is also the case that 
X G . This contradicts the assumption that k was minimal. 

Let P be a thread with the a-n-property. There are two observations to be made. First, if 
n > 1, then the set A of actions contains at least two elements, for otherwise the requirement 
that all first approximations of the distinct n-residuals of P must not equal a o D cannot be 
met. 

Second, not only are all of P's rt-residuals distinct, by extension the same holds of all 
TO-residuals with m < n. Moreover, since all first approximations of rt-residuals of P must 
not equal ao D, it follows that for any TO-residual Q and m'-residual R with < to < to' < n 
it is necessarily so that Q ^ R. 

Summarizing that second observation, we see that every TO-residual (m < n) of a thread 
P with the a-n-property is unique. As a result any instruction sequence with the a-n- 
property has at least 2" — 1 distinct test instructions with action a. 

Analyzing the proof, it turns out that it relies specifically on this second observation 
about threads with the a-n-property; requiring that threads with the a-n-property (n > 1) 
contain non-a actions is merely a means to that end. It turns out that we can define a 
slightly different class of threads with this second property without requiring that |^| > 2: 
we say that a thread P has the a-l-n- property if 7r„(P) = a" o D and P has 2" distinct 
n-residuals, none of which equals an (n— TO)-residual of P (for m > 0). 

To see why every TO-residual (to < n) of a thread P with the a-|-n-property is unique, 
assume the contrary: then there are values to and to' with m < m' < n such that some TO- 
residual Q oi P equals an TO'-residual R of P. But then every (n — TO')-residual of R equals 
some (n — TO')-residual of Q. This yields a contradiction, because every (n — m')-residual 
of R is an n-residual of P, which is by definition distinct from any (n — TO')-residual of Q, 
because m -I- (n — to') < n. Figure attemps to visualize this argument using a thread T 
with the a-|-5-property. 

For every n e N+ a thread P with the a+n-property can be created, such that P performs 
only a actions. Fix some n and let g: [0,2" — 1] — > {true, false}" be a bijection, where 
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{true, false}" is the set of all boolean sequences of length n. We write {g{mj)d+i for the 
(d+l)th element of g{m). Now we define the family of threads P' for all 1 < Z < 2" such 
that:=^ 

, fp2'<a>p2'+i ifZ<2"-i, 
P = < (6.1a) 
[Q2i-2n ^ a > (92;-2"+i otherwise, 

Ql = aoD, (6.1b) 

r,d+i ^iQi^a>D if (5(m))d+i = false, 

[D<a>Q^ otherwise. ^ ' ' 

Informally, the thread performs n a-ac;tions after which some state QJ^ is reachcid. Due 
to the nature of g, ^ Q^, for distinct m and m' . For example, for n = 2 and taking g 
such that 

I— > {false, false}, 1 H> {false, true}, 2 i—>- {true, false}, 3 i—>- {true, true}, 
the following equations are defined: 

pi=p2<a>p3^ P'^ = Ql<a>Q\, P^ = Ql<a>Ql, 

and, 

Ql = Ql<a\>D, Ql = Q\<a\>D, Ql = D<a\>Ql, Ql = D<a>Q\, 
Ql = Ql<a>D, Q\ = D<a>Q\, Ql = Q%<a>D, Ql = D<a>Ql, 
Ql = aoD, Q\ = aoD, Ql = aoD, Ql = aoD. 

Observe that any thread performs n + 1 a-actions only if the sequence of boolean replies 

yielded by the first n actions is exactly according to g{m.). Thus each thread Q", is a unique 
n-residual of P^ (recall that g is bijective). Since D is a 1-residual of every thread Q^, 
but not of any thread P' we conclude that P^ meets the necessary criteria to have the 
a+n-property. 

Replacing any thread with the a-n-property in the proof of Bergstra & Ponse with a 
thread with the a+n-property results in a valid proof which requires only that |^| ^ 0, as 
opposed to \A\ > 1. This proves our claim. □ 

We have now established that arbitrarily many distinct jump instructions are required 
to let C express all finite threads. It turns out that jump instructions in a single direction 
suffice. 

Proposition 6.3. Let C be an infinite but otherwise arbitrary set of forward jump 
instructions and let the code semigroup C be generated by the instruction sei *P~*U J~*'U{!}. 
Then C can express all finite thread but no infinite threads. This also holds if^^ is replaced 
by . If C is an infinite but otherwise arbitrary set of backward jump instructions, 
then the instruction sets U J"^ U {!} and Of*" U U {!} also generate a code semigroup 
which characterizes BTA. 

Proof. As C does not contain backward instructions, it cannot create any kind of loop (for 
all i,j € Z, if i -^x j according to some X G I^,, then necessarily i < j). Every instruction 
sequence is finite, thus so is any thread extracted from a C'-inseq X. What remains to be 
shown is that all BTA threads can be described by C". 

Let P G BTA be a finite thread. We will inductively construct a C" instruction sequence 
Xp such that |-X'p|^ = P. For convenience we will define F = {5{u) \ u G J~^} to be the 
set of jump counters of admitted jump instructions. 



^In this definition relevant values for d and m are in the ranges [0, n — 1] and [0, 2" — 1], respectively. 
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If P = S then define Xp = !. If P = D then define Xp — for some k ^ F. Otherwise 
P = Q <! a t> P for sonic a E A and Q,R € BTA. By induction there are Xq,Xp G I^, 
such that I-'^^qI^ = Q and l-'^ijlp — R. 

Create an inseq X'j^ from Xp by changing the jump counter k of any jump instruction 
at an exit position in Xp to some value fc' e {j G P | j > fc + £(Xq)}. (These are the 
instructions which upon execution cause deadlock). 

Now we define Xp = +/a; /#fc; X^; (!)P; Xg, where k e {j e F \ j > £(X^)} and 
p = k — i{X'j^) — 1. It is not hard to see that indeed |X_p|p" = P. Note that the termination 
instructions introduced here are solely for the purpose of padding. They are not reachable 
from the leftmost instruction. 

A similar construction can be made using negative tests. When using backward jump 
instructions create an inseq Xp such that |Xp|(^ = P. □ 

Although all finite threads can be expressed using jump instructions in only one direction, 
this is not the case for all regular threads. In fact, infinitely many distinct jump instructions 
in both directions are necessary. 

Definition 6.4. In an instruction sequence X = ui;u2; ... ;uk S 2rJ an instruction Uj is 
i-n-relevant if there exists an instruction sequence X', created from X by changing Uj to 
some other instruction u G 2a, such that 7r„(|X|^) 7r„(|X'|^). In other words: the nth 
projection of the execution of inseq X starting at position i depends on Uj. Observe that 
any instruction which is i-n-relevant is also i-(n-|-l)-relevant. 

Tiieorem 6.5. Let A be non-empty and fix some k G LetTc be the largest subset ofic 
which does not contain forward (backward) jump instructions with a jump counter greater 
than k (i.e., Tc contains a finite number of forward or backward jump instructions). Then 
the semigroup C generated by Xc cannot express all regular threads. 

Proof. Let k be fixed and select n such that 2" > 2/c -I- 3. We will assume that C restricts 
forward jump instructions (a similar argument holds if backward jump instructions are 
restricted). Let g: [0,2^" — 1] ^> {true, f alse}^" be a bijection, where {true, f alse}^" is 
the set of all boolean sequences of length 2n. We write {g{m))d+i for the (d-f l)th element 
of g{m). Now we define the family of threads P' for all 1 < Z < 2^" such that:^ 

(6.2a) 
(6.2b) 
(6.2c) 

Figure 16.21 presents a graphical representation of thread P^ for n = 2. Observe the 
similarities of this set of equations to those presented in (|6.1|) . Recall from ij5.3.1l that [m]2" 
is the remainder of m after division by 2". Informally, the thread P^ performs 2n a-actions 
after which some state is reached. Distinct sequences of boolean replies to these actions 
result in distinct values for m (0 < m < 2^"). Due to the nature of g, ^ Q,", for distinct 
m and m'. (To see why, observe that the 2n-residual D of can be reached starting in 
state only if the replies to the first 2n a-actions are precisely according to g{m) — and g 
is a bijection). Thus each thread Q^" is a unique 2n-residual of P^. Since D is a 2n-residual 
of every thread Q^, but not of any thread P' we conclude that P^ meets the necessary 
criteria to have the a-|-2n-property. 

Towards a contradiction assume that there exists a C'-expression X such that = P^ 
for some i G [1,£{X)]. We define /(/) = min{i | |X|^ = P'} to be the function which 





fp2' <a 


\> p2H 


-1 


a I < 22"-i, 


pi 






f)1n 






< a> 


,_(^-^ otherwise, 




= D, 








)d+l 




> p2" 


+ [m]2" 


if {g{m))d+i = false. 


rn 


1 p2" + [m] 


2" <! a 




otherwise. 



^In this definition relevant values for d and m are in the ranges [0, 2n - 1] and [0, 2^" - 1], respectively. 
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Figure 6.2: Graphical representation of the thread described by as defined by (j6.2p . 
for n = 2. Observe that the threads P'*, P^, P^ and P^ (i.e. the threads P^ through 
P^ ^ are n- residuals of of P^. Likewise each thread thread = is a 2n-residual 
of P^. Each thread is distinct, and each of P^'s n- residuals is a residual thread of 
each thread Qf^. Expanded are threads Qq and which are defined according to g{0) = 
{false, false, false, false} and g{3) ~ {false, false, true, true}, respectively. Note 
that = D for all m G [0, 15], thus in particular Qq = = D. 

returns the leftmost position in X from which the thread can be extracted. Without loss 
of generality we will assume that all instructions in X are reachable from position i, for if 
not, then by Proposition 13.51 we can create an instruction sequence X' for which this does 
hold. The largest jump counter of any forward jump instruction in X' would be less than 
or equal to the largest forward jump distance in X . 

For distinct 1,1' < 2^" it is the case that P' ^ P' (because P^ has the a+2n-property) 
and thus necessarily f{l) ^ f{l')- The n-residuals of P^ are the threads P' for / G [2", 2"+^ — 
1]. The integers in this range are totally ordered by the function 

lo,ll, . . . , h"-!- 

No instruction in X is both /(Zi)-n- relevant and /(/j)-n-relevant for distinct i and j, because 
every thread P'* is an n-residual of P^, and P^ has the a+2ri-property. Moreover, the n- 
residuals of any thread P'" are the threads (5|2*"+m, for < to < 2". The thread P''" in 
turn is an 1-residual (and a 2, 3, ... , 2ri-residual) of the thread Qi2-'^+m- Thus every thread 
P'^ is a residual thread of every thread P'* . 

Recall that 2" > 2fc + 3 and that C" does not contain forward jump instructions over a 
distance greater than k. Thus for some i < fc + 1 all /(li)-n-relevant instructions are left of 
position f{lk+i). For if not, then there are k + 1 distinct positions < f{lk+i) containing jump 
instructions which target fc + 1 distinct positions > f{lk+i)- This is not possible because of 
the restriction on forward jump counters. 

Fix said i, and note that there are at least /c + 1 instructions which are f{li)-{n+\)- 
relevant to the right of f{lk+i)- namely f{lk+2),f(lk+z),---,f{kk+2)- This leads to a 
contradiction, since this, too, is not possible because of the restriction on jump counters. □ 

■^The ordering on [2", 2"+^ — 1] imposed by / does not need to be the natural ordering of these integers! 
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Now that it has been estabhshed that an upper bound on the value of jump counters 
hmits expressiveness, even if only in a single direction, the question naturally arises whether 
any two infinite collections of forward and backward jump instructions suffice to express all 
regular threads. We prove that this is indeed the case. 

Theorem 6.6. Let C and J'^ C be two infinite but otherwise arbitrary sets of 
jump instructions and let the code semigroup C be generated by the sei U U U{!}. 
Then all regular threads can be expressed by C . This also holds if^^ is replaced by 
orm^. 

Proof. Fix some infinite C and J"^ C j;^ and select arbitrary T S BTA'^''^ with 
states Po,Pi,. . ., Pn-i. Then the result of the procedure ConstructInseq(T, {6{u) \ u e 
J~^}, {S{u) I u e J^}) as outlined in Algorithm 16. f I is a C'-inseq X such that \X\'^ = T. 



Algorithm 6.1 C-expression construction using a restricted set of jump counters 
Require: A regular thread T with states Pq, Pi, ... , P„_i and infinite sets F, B CN. 
Ensure: A C-inseq X with \X\^ = Pa, {S{u) \ u G Z^{X)} C F, {5{u) \ u e 3^(^)1 C B. 



1: procedure Construct1nseq(T, F, B) 

2: S ^ RANDOMSELECT({j G F | j > 4}) 

3: z ?T. • s • (s — f ) > Largest (rightmost) instruction position 

4: / ^ > Set of {position, instruction) tuples 

5: for i to n — f do 

6: for r ■(— to s - 1 do 

7: C ^ (l • s + r) • (s - f ) + 1 

8: if Pi = S then 

9: /^/U{(C,!)} 

10: else if = D then 

11: d RANDOMSELECT({j G P | j > c}) 

12: / ^ / U {(c, > Jump outside program: deadlock 

13: else if Pi = Pj ^ a > Pfc then 

14: /^/U{(c,+/a)} 

15: / <- / U Connect(c + f , j ■ s • (s - 1) + f , z, s, F, B) 

16: z <— max{p I 3u[(p, u) G /]} 

17: / / U Connect(c + 2, fc • s ■ (s - f ) + 1, z, s, P, B) 

18: z <— max{p I 3u[{p,u) G /]} 

19: end if 

20: end for 

21: end for 

22: return ConcatInstructions(/ U {(p, !) | < p < z, u) G /]}) 

23: end procedure 

24: procedure Connect(z, j, z, s, F, B) 

25: r z + RandomSelect({A: G P I i + fc > z}) 

26: I ^ r - RandomSelect({/c E B \ r - k < j}) 

27: p 4- [(j - l)/s\ 

28: p^p + j-{l+p-s) 

29: return {{i, /#r-i)} U {(r + fc • s, /#s) \ Q < k < p} [J {{r + p ■ s, \#r-l)} 
30: end procedure 



Suppose we want to transfer control of execution in an inseq X from position i to position 
j. Obviously, J~*' U may not contain the jump instruction required to jump immediately 
from i to j. In fact, it may be so that no sequence of jump instructions permitted by 
U can transfer control of execution from position i to j. For example, if only even 
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jump counters are available, then control of execution cannot be transferred from i to j if 
i — j is odd. 

Algorithm l6.1l solves this issue by producing an instruction sequence X in which function- 
ally equivalent subsequences of instructions are repeated s times at evenly spaced intervals 
of length s — 1. The value of s is selected from the set of permissible forward jump counters 
J~^, with the sole restriction that s > 4. Thus, for any Pk there are at least s positions 
jo, Ji, ■ • ■ , js-i (with jm+i = jm + s — 1) in X from which Pk can be extracted and for any 
position i in X there is at least one such position jm such that i = jm (mod s). 

Now the general procedure to "connect" a position i to one such jm in X using a sequence 
of permissible jump instructions is to extend X with a sequence of jump instructions to the 
right of X, as follows. First, select a sufhciently large forward jump instruction / which, 
if placed at position i, jumps outside of X to some position r. Second, select a sufficiently 
large backward jump instruction b which, if placed at position r, jumps to a position I < jo- 
Now observe that, instead of placing b at position r, we can add a sequence of chained 
/#s instructions, starting at position r and extending to the right, such that they transfer 
control of execution to some position r' > r. r' can be selected such that if the backward 
instruction b were placed there, it would jump to a position I' between jo — (s — 1) and 
jo- By adding another jo — I' chained /#s instructions starting at position r', control of 
execution will be transferred to a position r" > r' from which the instruction b will target 
exactly one of the positions jm- Specifically, m = jo — I'- The procedure described here 
is performed by Connect(z, ji, £{X), s, {6{u) \ u G J^}, {S{u) \ u e J^}), which returns 
the required jump instructions and the positions where they should be placed. 

The procedure CONSTRUCTlNSEQ(r, {5{u) \ u S J~^}, {5{u) I " ^ J"^)) selects a 
suitable value s and ensures that for every thread Pi there are s positions jo, Ji, ■ • • , js-i 
from which Pi can be extracted. At each of these positions it places a suitable instruction: 
! if Pi = S, # if Pi = D and -t-/a if Pi = Pi' <a\> Pi". In the latter case Connect(. . . ) is 
used to ensure that indeed either of Pi' and Pi" will be reached after exectution of action 
a. □ 



6.2 The Expressiveness of Subsemigroups of Cg 

Equipped with the translations of Chapter [5] and the theorems of ij6.21 we are now ready to 
make statements about the expressiveness oi Cg and some of its subsemigroups. 

Proposition 6.7. Each thread definable in Cg is regular, and each regular thread can be 
expressed in Cg- 

Proof- This follows immediately from the fact that c2CG and CG2c are behavior preserving 
and total. Since C characterizes the regular threads (see Proposition 13. ip . so does Cg. □ 

Theorem 6.8. Let A be non-empty. There does not exists a value k G such that Cg<k 
can express all finite threads. 

Proof. Upon analyzing the family of translations CG2Cfe as defined in t i5.4.11 we see that 
they map Cf/<fc-expressions to behaviorally equivalent C<4fc+i2-expressions. 

Thus if Cg<k can express all finite threads, then so can C<ik+i2- But by Theorem 16.21 
this is impossible. □ 

Proposition 6.9. Let C be an infinite but otherwise arbitrary set of forward 

goto instructions and let L^ C constitute the set of label instructions which match the 
goto instructions in . Then the code semigroup Cg' generated by the instruction set 
U G^ U L^ U {!} can express all finite threads but no infinite threads. This also holds if 
is replaced by 91^. If the infinite sets C'^ C and L^ C are defined analogously, 
then the instruction sets U U L^ U {!} and 91'*^ U C'^ U P*^ U {!} also generate a 
code semigroup capable of expressing all finite threads. 
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Proof. As in the proof of Proposition 16.31 we observe that Cg' does not contain backward 
instructions. Thus it can only express finite threads, as loops (a requirement for infinite 
behavior) cannot be constructed in Cg'. Now we need to show all BTA threads can be 
expressed by Cg'. 

We will inductively define a Cg' instruction sequence Xp for every P E BTA such that 
jXplpg = P. Let F = {X{u) I u G G^} be the set of label numbers of available goto 
instructions. 

If P = S then Xp = !. If P = D then set Xp = /^^£l, where I is an arbitrary element 
of F. Otherwise P = Q < a t> R and there are Xq,Xp £ 1'^^, such that iXgl^^ = Q and 
\Xp\^^ = R. Select some label number I £ F such that it is not present in Xq or Xp. Then 

Xp = +/a-/#i^n-Xp-/n-XQ. 

A similar construction can be made using negative tests. When using backward goto 
instructions create an inseq Xp such that \Xp\fjg = P. □ 

Theorem 6.10. Let A be non-empty and fix some value k £ N"*". LetXcg' be the largest sub- 
set ofXcg which does not contain forward (backward) goto instructions with a label number 
k or greater (i.e., T-cg' contains a finite number of forward or backward goto instructions). 
Then the semigroup Cg' generated by Tcg' cannot express all regular threads. 

Proof. The proof is analogous to that of Theorem l6.5l Again select n such that 2" > 2fc + 3 
and consider the thread Pi as defined by (|6.2p . As before the function /(/) = min{i 

= P'} induces a total ordering on the range [2", 2"+^ — 1], say Iq, h, . . . h'^-i- Observe 
that for some i < fc + 1 all /(Zi)-n-relevant instructions are left of position f{lk+i), for 
otherwise there must be fc + 1 distinct goto instructions on positions < f{lk+i) which target 
fc + 1 distinct label instructions on positions > f{lk+i)', impossible, as Icg' contains only k 
distinct forward goto instructions. 

Fixing said i we note that there are at least fc + 1 positions which are /(Zi)-(n+l)-relevant 
to the right of f{lk+i). this too is impossible, for the same reason. Contradiction. □ 
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Discussion 



This thesis can be divided into four parts: the introduction of C and the theory behind it, 
the introduction of Cg as an alternative to C, the definition of translations between these, 
and several results about the expressiveness of C and Cg. 

We have proved that C and Cg are equally expressive by means of the total mappings 
C2CG and CG2c. We have also proved that such translations are only possible if the max- 
imum jump counter (or label number) in the input inseq is known. As a result C2CG and 
CG2c cannot be homomorphic. 

We then went on to prove that any subsemigroup of C {Cg) needs to contain infinitely 
many jump instructions (matching label and goto instructions) in order to express all finite 
threads (Theorem 16.21 Theorem 16. 8p . In order to express all regular threads it is even 
necessary that such a semigroup contains infinitely many jump instructions (label/goto 
instructions) in both directions (Theorem 16.51 Theorem 16. lOp . The upshot is that any such 
infinite collection of jump instructions (label/goto instructions) suffices (Theorem 16.61 the 
corresponding result for Cg is trivial). 

7.1 Further Work 

The translations between C and Cg in Chapter [5] use label and goto instructions to mimic 
the behavior of jump instructions and vice versa. There are some open questions about 
the nature of these translations: it is not known whether alternative behavior preserving 
mappings can be defined which employ less jump instructions or label/goto instructions. 
More precisely, 

• Given an arbitrary value A: e N, what is the smallest value fc' G N for which there 
exists a behavior preserving mapping /: 1^^^^ I^^^l (By definition of equation 
dSSD in ij5.4.1l we already know that k' <Ak + l2.) 

• As demonstrated by the translations defined in N5.31 there exist behavior preserving 
mappings /: 1-c_,^ -^Cg<_k ^-U > 2. Is there any value A; G N such that for some 
k' < k the mapping /: Ic<k ~^ -^Cg^y behavior preserving? 
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Appendix 



Overview of Defined Translations 



Figure lA.ll provides a graphical representation of the most important sets of (single pass) 
instruction sequences introduced in this thesis. Recall that the set contains all C- 
expressions. For arbitrary /c G N, is the largest subset of Iq which does not contain 

C-inseqs with relative jumps over a distance greater than k. Similarly, 1^^ contains all Cg- 
expressions, and X^^^^ contains those inseqs without goto instructions with a label number 
greater than fc. All PGA terms are contained in P; the set P\ is the largest set which is 
restricted to single pass instruction sequences in first canonical form. P-2, contains PGA's 
second canonical forms. 



C2CG 




Figure A.l: Overview of semigroups and single-pass instruction sequences and certain behav- 
ior preserving mappings defined between then, as introduced in this thesis. Dotted arrows 
represent homomorphisms. There is also a non-homomorphic version of c2CGfe f ii5.3.1|) . 
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Proof by Bergstra &. Ponse 



The proof of Theorem 16. II is presented m Section 9 of |BP09a) . As the proof of Theorem [62] 
builds upon this result, Section 9 of |BP09a) is reproduced here verbatim, with kind per- 
mission of the authors. Three minor changes have been applied: a section reference has 
been updated to point to an equivalent section in this thesis, a footnote has been added and 
the last paragraph has been left out, as it is merely an introduction to Section 10 of that 
publication. 

Observe that |BP09a| uses notation which in some places differs slightly from notation 
introduced in this thesis. 

B.l Expressiveness and reduced instruction sets 

In this section we further consider C's instructions in the perspective of expres- 
siveness. We show that setting a bound on the size of jump counters in C does 
have consequences with respect to expressiveness: let 

be defined by allowing only jump instructions with counter value k or less. 

We first introduce some auxiliary notions: following the definition of residual 
threads in Section [2. 11 we say that thread Q is a 0-residual of thread P if P = Q, 
and an n + 1-residual of P if for some a Cz A, P ~ Pi ^ a > P2 and Q is an 
n-residual of Pi or of P2 . Note that a finite thread (in ETA) only has n-residuals 
for finitely many n, while for the thread P defined by P = a o P it holds that P 
is an 71-residual of itself for each rt e N. 

Let a G Abe fixed and n G N+. Thread P has the a-n-property if 7r„(P) = a" o D 
and P has 2" — 1 (different) n-residuals which all have a first approximation not 
equal to a o D.^ So, if a thread P has the a-n-property, then n consecutive 
a-actions can be executed and each sequence of n replies leads to a unique n- 
residual. Moreover, none of these residual threads starts with an a-action (by 
the requirement on their first approximation) . We note that for each n G N"^ we 
can find a finite thread with the a-n-property. In the next section we return to 
this point. 

A piece of code X has the a-n-property if for some i, \X\i has this property. It 
is not hard to see that in this case X contains at least 2" — 1 different a-tests. 



^It appears that the authors meant to use 2" instead of 2" — 1 in this sentence, though this does not 
affect the proof in any serious way. — Stephan 



57 



Appendix B. Proof by Bergstra & Ponse 



As an example, consider 

X = \;\b; +\a; +/a: \#2; +/a: /#2; /c; # 

Clearly, X has the a-2-property because \X\4 has this property: its 2-residuals 
are 60S, S, D and c o D, so each thread is not equal to one of the others and 
does not start with an a-action. 

Note that if a piece of code X has the a-(n + fc)-property, then it also has the 
a-n-property. In the example above, X has the a-l-property because \X\3 has 
this property (and \X\q too). 

Lemma 1. For each fc e N there exists n e N+ such that no X e Cfc has the 
a-n-property. 

Proof. Suppose the contrary and let k be minimal in this respect. Assume for 
each n G N+, Yn G Ck has the a-n-property. 

Let B = {true, false}. For a, P £ B* we write 

if a is a prefix of /3, and we write Q;-</3or/3>-aifa^/3 and a /3. 
Furthermore, let 

n 
i=0 

thus B-" contains all i?*-sequences a with £(a) < n (there are 2"+^ — 1 such 
sequences). 

Let (7 : N — 5- N be such that has the a-n-property. Define 

/„ : B^" ^ N+ 

by /„ (a) = TO if the instruction reached in Yn when execution started at position 
g{n) after the replies to a according to a has position m. Clearly, /„ is an injective 
function. 

In the following claim we show that under the supposition made in this proof 
a certain form of squeezing holds: if k' is sufficiently large, then for all n > 
there exist Q!,/3,7 G B''' with fk'+nia) < fk'+n{P) < fk'+n{l) with the property 
that fki+n{oi) < fk'+n{P') < fk'+n{l) for cach extension /?' of /? within B-'^ 
This claim is proved by showing that not having this property implies that "too 
many" such extensions (3' exist. Using this claim it is not hard to contradict the 
minimality of k. 

Claim 1. Let k' satisfy 2*^' >2k + 3. Then for all n > there exist a, /3, 7 G B*^' 
with 

fk'+n{a) < fk'+n{l3) < fk'+nil) 

such that for each extension )^ /3 in B-'' 

fk'+n(a) < fk'+n{P') < fk'+nh)- 

Proof of ClaimlJi Let k' satisfy 2'^ > 2k + 3. Towards a contradiction, suppose 
the stated claim is not true for some n > 0. The sequences in B'' are totally 
ordered by fk'+n, say 

/fc'+n(ai) < /fe'+ri(a2) < . . • < fk'+n{a2k'). 



Expressiveness and reduced instruction sets 
Consider the following list of sequences: 

Qfl , a2 , . . . , a2k+2 , Ct2k+3 

" V ' 

choices for (3 

By supposition there is for each choice /3 G {a2, ■ • ■ , Q.2k+2} an extension l3' >- P 
in B<fc'+" with 

either fk>+niP') < fk'+niai), or fk'+nW) > fk'+n{a2k+3)- 

Because there are 2fc + 1 choices for /3, assume that at least fc + 1 elements 
/3 S {ct2, • ■ • , c(2k+2} have an extension /3' with 

fk'+nW) < fk'+niai) 

(the assumption fk'+n{P') > fk'+n{a2k+3) for at least k + 1 elements /3 with 
extension /3' leads to a similar argument). Then we obtain a contradiction with 
respect to fk'+n- for each of the sequences (3 in the subset just selected and its 
extension /3', 

fk'+nW) < fk'+n{ai) < fk'+nW), 

and there are at least fc + 1 different such pairs /?,/?' (recall fk'+n is injective). 
But this is not possible with jumps of at most fc because the fk'+n values of 
each of these pairs define a path in Yk'+n that never has a gap that exceeds fc 
and that passes position fk'+n{ai), while different paths never share a position. 
This finishes the proof of Claim [TJ □ 

Take according to Claim [T] an appropriate value fc', some value n > and 
a,/?, 7 G . Consider Yk'+n and mark the positions that are used for the 
computations according to a and 7: these computations both start in position 
g{k' + n) and end in fk'+n[oi) and fk'+n{l), respectively. Note that the set of 
marked positions never has a gap that exceeds fc. 

Now consider a computation that starts from instruction fk'+n{P) in Yk'+n, a 
position in between fk'+n{oi) and fk'+n{"f)- By Claim[Tl the first n a-instructions 
have positions in between fk'+n{o) and fk'+n{l) and none of these are marked. 
Leaving out all marked positions and adjusting the associated jumps yields a 
piece of code, say with smaller jumps, thus in Ck-i, that has the a-n-property. 
Because n was chosen arbitrarily, this contradicts the initial supposition that fc 
was minimal. □ 

Theorem 1. For any fc £ N"*", not all threads in BTA can be expressed in Ck- 
This is also the case if thread extraction may start at arbitrary positions. 



Proof. Fix some value fc. Then, by Lemma [T] we can find a value n such that no 
X G Ck has the a-n-property. But we can define a finite thread that has this 
property. □ 
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